开发者

How to interpret an Excel number format string to determine whether the value should be parsed by DateTime.FromOADate

How can I create a function "bool IsDateTime" that will reliably determine whether an Excel number format string like "[$-409]h:mm:ss AM/PM;@" indicates that the numeric value is a DateTime that should be passed to DateTime.FromOADate?

I've figured out what the [$-409] is: Excel Number Format: What is "[$-409]"?. It's just a locale code.

I've also read a little about the number format string being separated into four format sections by semicolons: http://office.microsoft.com/en-us/excel-help/create-or-delete-a-custom-number-format-HP005199500.aspx?CTT=5&origin=HP005198679 and here http://www.ozgrid.com/Excel/excel-custom-number-formats.htm

For example, would it be reliable to simply search for occurrences of the date/time format characters like h,m,s,y,d? How might Excel interpret it?

In case the question is not clear... when you read an Excel file and look at a date/time value, you're actually looking at a plain old double-pre开发者_StackOverflow社区cision value, because that's how it's stored in Excel. To figure out whether it's an ordinary double or a double that should be passed to DateTime.FromOADate, you must interpret the custom number format string. So I am asking how to go about interpreting such a string, which may or may not refer to a date/time value, in order to determine whether the double-precision value should be converted to a DateTime value via DateTime.FromOADate. Furthermore, if successfully converted to a DateTime value, I would then need to convert the Excel number format string into an equivalent .NET DateTime format string so I could display the date/time value as Excel would via DateTime.ToString( converted_format_string ).


You can check if the cell contains any of the built in date formats by using the CELL function and returning the format. It will return "D" followed by a number if it is using a built in format.

For example:

=IF(LEFT(CELL("format", A1),1)="D",TRUE,FALSE)

For a more general case I would first check is the cell a number (ISNUMBER()) and within the range for a date (i.e., between 0 and TODAY() - which is 39296 today). I would then check the number format for the occurrence of at least one d, m, y, h, M or s as this should signify that you have a date in the cell.

Hope this helps,

Dave


I implemented a class to parse the Excel number format string. It looks at the first section (of four possible sections in the format string), and uses a Regex to capture date/time specific custom format characters such as "y", "m", "d", "h", "s", "AM/PM", and returns null if none are found. This first step simply decides whether the format string is meant for a date/time value, and leaves us with an object-oriented ordered list of logical date/time format specifiers for further processing.

Assuming it was decided that the format string is meant for a date/time value, the captured and classified values are sorted into the order they were found in the original format string.

Next, it applies Excel-specific formatting quirks, like deciding whether "m" means month or minute, interpreting it as "minute" only if it appears immediately after an "h" or before an "s" (literal text is allowed between them, so it's not exactly "immediately" before/after). Excel also forces 24-hour time for the "h" character if "AM/PM" is not also specified, so if "AM/PM" is not found, it uses the lowercase m (24-hour time in .NET), otherwise it converts it to a capital M (12-hour time in .NET). It also converts "AM/PM" to the .NET equivalent "tt", and blanks out conditional expressions, which cannot be included in a plain .NET DateTime format string.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections;

namespace utilities.data
{
    public enum NumberFormatCaptureType
    {
        Condition,
        LiteralText,
        Year,
        Month,
        Day,
        Hour,
        Minute,
        Second,
        AMPM
    }

    public class NumberFormatTypedCapture
    {
        private class ClassificationPair
        {
            public string Name;
            public NumberFormatCaptureType Type;
            public bool IndicatesDateTimeValue;
        }

        private static readonly Regex regex = new Regex( @"(?<c>\[[^]]*])*((?<y>yyyy|yy)|(?<m>mmmm|mmm|mm|m)|(?<d>dddd|ddd|dd|d)|(?<h>hh|h)|(?<s>ss|s)|(?<t>AM/PM)|(?<t>am/pm)|(?<l>.))*", RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled );
        private static readonly ClassificationPair[] classifications = new ClassificationPair[] {
            new ClassificationPair() {Name="c", Type=NumberFormatCaptureType.Condition, IndicatesDateTimeValue=false},
            new ClassificationPair() {Name="y", Type=NumberFormatCaptureType.Year, IndicatesDateTimeValue=true},
            new ClassificationPair() {Name="m", Type=NumberFormatCaptureType.Month, IndicatesDateTimeValue=true},
            new ClassificationPair() {Name="d", Type=NumberFormatCaptureType.Day, IndicatesDateTimeValue=true},
            new ClassificationPair() {Name="h", Type=NumberFormatCaptureType.Hour, IndicatesDateTimeValue=true},
            new ClassificationPair() {Name="s", Type=NumberFormatCaptureType.Second, IndicatesDateTimeValue=true},
            new ClassificationPair() {Name="t", Type=NumberFormatCaptureType.AMPM, IndicatesDateTimeValue=true},
            new ClassificationPair() {Name="l", Type=NumberFormatCaptureType.LiteralText, IndicatesDateTimeValue=false}
        };
        private Capture Capture;
        private string mutable_value;
        public NumberFormatCaptureType Type;

        public NumberFormatTypedCapture( Capture c, NumberFormatCaptureType t )
        {
            this.Capture = c;
            this.Type = t;
            mutable_value = c.Value;
        }

        public int Index
        {
            get {return Capture.Index;}
        }

        public string Value
        {
            get {return mutable_value;}
            set {mutable_value = value;}
        }

        public int Length
        {
            get {return mutable_value.Length;}
        }

        public static string ConvertToDotNetDateTimeFormat( string number_format )
        {
            string[] number_formats = number_format.Split( ';' );
            Match m = regex.Match( number_formats[0] );
            bool date_time_formatting_encountered = false;
            bool am_pm_encountered = false;

            //Classify the catured values into typed NumberFormatTypedCapture instances
            List<NumberFormatTypedCapture> segments = new List<NumberFormatTypedCapture>();
            foreach (ClassificationPair classification in classifications)
            {
                CaptureCollection captures = m.Groups[classification.Name].Captures;
                if (classification.IndicatesDateTimeValue && captures.Count > 0)
                {
                    date_time_formatting_encountered = true;
                    if (classification.Type == NumberFormatCaptureType.AMPM)
                        am_pm_encountered = true;
                }
                segments.AddRange( captures.Cast<Capture>().Select<Capture,NumberFormatTypedCapture>( (capture) => new NumberFormatTypedCapture( capture, classification.Type ) ) );
            }

            //Not considered a date time format unless it has at least one instance of a date/time format character
            if (!date_time_formatting_encountered)
                return null;

            //Sort the captured values in the order they were found in the original string.
            Comparison<NumberFormatTypedCapture> comparison = (x,y) => (x.Index < y.Index) ? -1 : ((x.Index > y.Index) ? 1 : 0);
            segments.Sort( comparison );

            //Begin conversion of the captured Excel format characters to .NET DateTime format characters
            StringComparer sc = StringComparer.CurrentCultureIgnoreCase;
            for (int i = 0; i < segments.Count; i++)
            {
                NumberFormatTypedCapture c = segments[i];
                switch (c.Type)
                {
                    case NumberFormatCaptureType.Hour: //In the absense of an the AM/PM, Excel forces hours to display in 24-hour time
                        if (am_pm_encountered)
                            c.Value = c.Value.ToLower(); //.NET lowercase "h" formats hourse in 24-hour time
                        else
                            c.Value = c.Value.ToUpper(); //.NET uppercase "H" formats hours in 12-hour time
                        break;
                    case NumberFormatCaptureType.Month: //The "m" (month) designator is interpretted as minutes by Excel when found after an Hours indicator or before a Seconds indicator.
                        NumberFormatTypedCapture prev_format_character = GetAdjacentDateTimeVariable( segments, i, -1 );
                        NumberFormatTypedCapture next_format_character = GetAdjacentDateTimeVariable( segments, i, 1 );
                        if ((prev_format_character != null && prev_format_character.Type == NumberFormatCaptureType.Hour) || (next_format_character != null && next_format_character.Type == NumberFormatCaptureType.Second))
                            c.Type = NumberFormatCaptureType.Minute; //Format string is already lowercase (Excel seems to force it to lowercase), so just leave it lowercase and set the type to Minute
                        else
                            c.Value = c.Value.ToUpper(); //Month indicator is uppercase in .NET framework
                        break;
                    case NumberFormatCaptureType.AMPM: //AM/PM indicator is "tt" in .NET framework
                        c.Value = "tt";
                        break;
                    case NumberFormatCaptureType.Condition: //Conditional formatting is not supported in .NET framework
                        c.Value = String.Empty;
                        break;
                    //case NumberFormatCaptureType.Text: //Merge adjacent text elements
                        //break;
                }
            }

            //Now that the individual captures have been blanked out or converted to the .NET DateTime format string, concatenate it all together than return the final format string.
            StringBuilder sb = new StringBuilder();
            foreach (NumberFormatTypedCapture c in segments)
                sb.Append( c.Value );
            return sb.ToString();
        }

        private static NumberFormatTypedCapture GetAdjacentDateTimeVariable( List<NumberFormatTypedCapture> captures, int current, int direction )
        {
        check_next:
            current += direction;
            if (current >= 0 && current < captures.Count)
            {
                NumberFormatTypedCapture capture = captures[current];
                if (capture.Type == NumberFormatCaptureType.Condition || capture.Type == NumberFormatCaptureType.LiteralText)
                    goto check_next;
                return capture;
            }
            return null;
        }
    }
}

The above class can be used in the following context to read string values into a DataTable from the columns in an Excel file that have non-null headers. Specifically, it attempts to acquire a valid DateTime instance, and if one is found, it attempts to construct a valid .NET DateTime format string from the Excel number format string. If both of the previous steps are successfuly, it stores the formatted date time string in the data table, and otherwise it converts whatever value is present to a string (ensuring to strip out rich text formatting first if present):

using (ExcelPackage package = new ExcelPackage( fileUpload.FileContent ))
{
    Dictionary<string,string> converted_dt_format_strings = new Dictionary<string,string>();
    ExcelWorksheet sheet = package.Workbook.Worksheets.First();
    int end_column = sheet.Dimension.End.Column;
    int end_row = sheet.Dimension.End.Row;

    DataTable datatable = new DataTable();

    //Construct columns
    int i_row = 1;
    List<int> valid_columns = new List<int>();
    for (int i_col = 1; i_col <= end_column; i_col++)
    {
        ExcelRange range = sheet.Cells[i_row, i_col];
        string field_name_text = range.IsRichText ? range.RichText.Text : (range.Value ?? String.Empty).ToString();
        if (field_name_text != null)
        {
            valid_columns.Add( i_col );
            datatable.Columns.Add( field_name_text, typeof(string) );
        }
    }

    int valid_column_count = valid_columns.Count;
    for (i_row = 2; i_row <= end_row; i_row++)
    {
        DataRow row = datatable.NewRow();
        for (int i_col = 0; i_col < valid_column_count; i_col++)
        {
            ExcelRange range = sheet.Cells[i_row, valid_columns[i_col]];

            //Attempt to acquire a DateTime value from the cell
            DateTime? d = null;
            try
            {
                if (range.Value is DateTime)
                    d = (DateTime)range.Value;
                else if (range.Value is double)
                    d = DateTime.FromOADate( (double)range.Value );
                else
                    d = null;
            }
            catch
            {
                d = null;
            }

            string field_value_text = range.IsRichText ? (range.RichText.Text ?? String.Empty) : (range.Value ?? String.Empty).ToString(); //Acquire plain text string version of the object, which will be used if a formatted DateTime string cannot be produced
            string field_value_dt_text = null;

            if (d.HasValue)
            {
                try
                {
                    string excel_number_format = range.Style.Numberformat.Format;
                    string date_time_format = null;
                    if (excel_number_format != null)
                    {
                        if (!converted_dt_format_strings.TryGetValue( excel_number_format, out date_time_format ))
                        {
                            date_time_format = NumberFormatTypedCapture.ConvertToDotNetDateTimeFormat( excel_number_format );
                            converted_dt_format_strings.Add( excel_number_format, date_time_format );
                        }
                        if (date_time_format != null) //Appears to have Date/Time formatting applied to it
                            field_value_dt_text = d.Value.ToString( date_time_format );
                    }   
                }
                catch
                {
                    field_value_dt_text = null;
                }
            }

            row[i_col] = (field_value_dt_text == null) ? field_value_text : field_value_dt_text;
        }
        datatable.Rows.Add( row );
    }
    return datatable;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜