Safe/Allowed filename cleaner for .NET
Is there any standardized / libraried / tested way in .NET to to take an arbitrary string and mangle it in such a way that it represents a valid file name?
Rolling开发者_如何学编程 my own char-replace function is easy enough, but I'd like something a little more robust and resued.
You can use Path.GetInvalidFileNameChars to check out which characters of the string are invalid, and either convert them to a valid char such as a hyphen, or (if you need bidirectional conversion) substitute them by a escape token such as %
, followed the hexadecimal representation of their unicode codes (I have actually used this technique once but don't have the code at hand right now).
EDIT: Just in case someone is interested, here is the code.
/// <summary>
/// Escapes an object name so that it is a valid filename.
/// </summary>
/// <param name="fileName">Original object name.</param>
/// <returns>Escaped name.</returns>
/// <remarks>
/// All characters that are not valid for a filename, plus "%" and ".", are converted into "%uuuu", where uuuu is the hexadecimal
/// unicode representation of the character.
/// </remarks>
private string EscapeFilename(string fileName)
{
char[] invalidChars=Path.GetInvalidFileNameChars();
// Replace "%", then replace all other characters, then replace "."
fileName=fileName.Replace("%", "%0025");
foreach(char invalidChar in invalidChars)
{
fileName=fileName.Replace(invalidChar.ToString(), string.Format("%{0,4:X}", Convert.ToInt16(invalidChar)).Replace(' ', '0'));
}
return fileName.Replace(".", "%002E");
}
/// <summary>
/// Unescapes an escaped file name so that the original object name is obtained.
/// </summary>
/// <param name="escapedName">Escaped object name (see the EscapeFilename method).</param>
/// <returns>Unescaped (original) object name.</returns>
public string UnescapeFilename(string escapedName)
{
//We need to temporarily replace %0025 with %! to prevent a name
//originally containing escaped sequences to be unescaped incorrectly
//(for example: ".%002E" once escaped is "%002E%0025002E".
//If we don't do this temporary replace, it would be unescaped to "..")
string unescapedName=escapedName.Replace("%0025", "%!");
Regex regex=new Regex("%(?<esc>[0-9A-Fa-f]{4})");
Match m=regex.Match(escapedName);
while(m.Success)
{
foreach(Capture cap in m.Groups["esc"].Captures)
unescapedName=unescapedName.Replace("%"+cap.Value, Convert.ToChar(int.Parse(cap.Value, NumberStyles.HexNumber)).ToString());
m=m.NextMatch();
}
return unescapedName.Replace("%!", "%");
}
This problem is not as simple as you may think. Not only are the characters in Path.GetInvalidFileNameChars
illegal, there are several filenames, such as "PRN" and "CON", that are reserved by Windows and cannot be created. Any name that ends in "." is also illegal in Windows. Moreover, there are various length limitations. Read the full list here.
If that's not enough, different filesystems have different limitations, for example ISO 9660 filenames cannot start with "-" but can contain it.
Can you provide more detail on what you mean by "generate from an arbitrary string"? Based on what your saying, it sounds like you're asking
Is there any way to take an arbitrary string and mangle it in such a way that it represents a valid file name?
If that's the case then no there is not a standard function available that I am aware of. However you could use the following which should do the trick
public static string MakeValidFileName(string name) {
var invalid = Path.GetInvalidFileNameChars();
var builder = new StringBuilder();
foreach ( var cur in name ) {
builder.Append(invalid.Contains(cur) ? '_' : cur);
}
return builder.ToString();
}
Have you had a look at Path.GetInvalidFileNameChars?
Found at Really Useful .NET Classes Part 1 - System.IO.Path
Just for the fun of it, I did it in one line..
Regex.Replace("http://codereview.stackexchange.com/questions/33851/how-can-i-improve-my-code/33857#33857", "[" + string.Join("", Path.GetInvalidFileNameChars().Select (p => p.ToString())) + "]", "_")
精彩评论