开发者

How do I split up a search string to allow for quoted text?

I want to make a list of strings from the text of a search field. I want to make anything that is in double quotes be split out.

ex.

sample' "string's are, more "text" making" 12.34,hello"pineapple sundays

Produces

sample' 
string's are, more_  //underscore shown to display space
text
 making
12.34
hello
pineapple
sundays

Edit: Here is my (somewhat) elegant solution, thanks for the help everyone!

Private Function GetSearchTerms(ByVal searchText As String) As String()
    'Clean search string of unwanted characters'
    searchText = System.Text.RegularExpressions.Regex.Replace(searchText, "[^a-zA-Z0-9""'.,= ]", "")

    'Guarantees the first entry will not be an entry in quotes if the searchkeywords starts with double quotes'
    Dim searches As String() = searchText.Replace("""", " "" ").Split("""")
    Dim myWords As System.Collections.Generic.List(Of String) = New System.Collections.Generic.List(Of String)
    Dim delimiters As String() = New String() {" ", ","}

    For index As Integer = 0 To searches.Length - 1
        'even is regular text, split up into individual search terms'
        If (index Mod 2 = 0) Then
            myWords.AddRange(searches(index).Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
        Else
            'check for unclosed double quote, if so, split it up and add, space we added earlier will get split out'
            If (searches.Length Mod 2 = 0 And index = searches.Length - 1) Then
                myWords.AddRange(searches(index).Split(delimiters, StringSplitOptions.RemoveEmptyEntries))
            Else
                '2 double quotes found'
                'remove the 2 spaces that we added earlier'
                Dim myQuotedString As String = searches(index).Substring(1, searches(index).Length - 2)
                If (myQuotedString.Length > 0) Then
                    myWords.Add(myQuotedString)
                End If
            End If
        End If
    N开发者_如何学Cext
    Return myWords.ToArray()
End Function

Oi, vb commenting is ugly, anyone know how to clean this up?


This is a more complex parsing problem than you fully appreciate. I suggest you look at the TextFieldParser class and the FileHelpers library: http://www.filehelpers.com/


This is not THE COMPLETE solution since it is missing few validation checks, but it has everything you need.

My CharOccurs() finds occurrences of '"' and stores them into list in order.

public static List<int> CharOccurs(string stringToSearch, char charToFind)
        {
            List<int> count = new List<int>();
            int  chr = 0;
            while (chr != -1)
            {
                chr = stringToSearch.IndexOf(charToFind, chr);
                if (chr != -1)
                {
                    count.Add(chr);
                    chr++;
                }
                else
                {
                    chr = -1;
                }
            }
            return count;
        }

This below code is pretty much explanatory iteself. I take the string which is within quoted and split them differently with only '"' character. Then I do SubStrings on outside quotes string and split them on ",", space and '"' charaters. Please add your validations checks wherever needed to make it generic.

string input = "sample' \"string's are, more \"text\" making\" 12.34,hello\"pineapple sundays";

            List<int> positions = CharOccurs(input, '\"');

            string within_quotes, outside_quotes;
            string[] arr_within_quotes;
            List<string> output = new List<string>();

            output.AddRange(input.Substring(0, positions[0]-1).Split(new char[] { ' ', ',', '"' }));

            if (positions.Count % 2 == 0)
            {
                within_quotes = input.Substring(positions[0]+1, positions[positions.Count - 1] - positions[0]-1);
                arr_within_quotes = within_quotes.Split('"');
                output.AddRange(arr_within_quotes);
                output.AddRange(input.Substring(positions[positions.Count - 1] + 1).Split(new char[] { ' ', ',' }));
            }
            else
            {
                within_quotes = input.Substring(positions[0]+1, positions[positions.Count - 2] - positions[0]-1);
                arr_within_quotes = within_quotes.Split('"');
                output.AddRange(arr_within_quotes);
                output.AddRange(input.Substring(positions[positions.Count - 2] + 1).Split(new char[] { ' ', ',', '"' }));
            }


I Wrote this Parse Line function a few months ago for VB.NET, it may be of some use to you, it works out if there are Text Qualifiers and will split based on the Text, ill try to convert it to C# for you in the coming few minutes if you want me to.

You Would have your line of Text:

sample' "string's are, more "text" making" 12.34,hello"pineapple sundays

and you would have that as your strLine and you would set your strDataDelimeters = "," and you would set you strTextQualifier = """"

Hope this helps you out.

Public Function ParseLine(ByVal strLine As String, Optional ByVal strDataDelimiter As String = "", Optional ByVal strTextQualifier As String = "", Optional ByVal strQualifierSplitter As Char = vbTab) As String()
        Try
            Dim strField As String = Nothing
            Dim strNewLine As String = Nothing
            Dim lngChrPos As Integer = 0
            Dim bUseQualifier As Boolean = False
            Dim bRemobedLastDel As Boolean = False
            Dim bEmptyLast As Boolean = False   ' Take into account where the line ends in a field delimiter, the ParseLine function should keep that empty field as well.


            Dim strList As String()

            'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,0000
            'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,0000,
            'TEST,23479234,Just Right 950g,02/04/2006,1234,,,0000,
            'TEST,23479234,Just Right 950g,02/04/2006,1234,5678,9999,,
            'TEST,23479234,"Just Right 950g, BO",02/04/2006,,5678,9999,,
            'TEST,23479234,"Just Right"" 950g, BO",02/04/2006,,5678,9999,1111,
            'TEST23479234 'Kellogg''s Just Right 950g' 02/04/2006 1234 5678 0000 9999
            'TEST23479234 '' 02/04/2006 1234 5678 0000 9999

            bUseQualifier = strTextQualifier.Length()

            'split data based on options..
            If bUseQualifier Then
                'replace double qualifiers for ease of parsing..
                'strLine = strLine.Replace(New String(strTextQualifier, 2), vbTab)

                'loop and find each field..
                Do Until strLine = Nothing

                    If strLine.Substring(0, 1) = strTextQualifier Then

                        'find closing qualifier
                        lngChrPos = strLine.IndexOf(strTextQualifier, 1)

                        'check for missing double qualifiers, unclosed qualifiers
                        Do Until (strLine.Length() - 1) = lngChrPos OrElse lngChrPos = -1 OrElse _
                          strLine.Substring(lngChrPos + 1, 1) = strDataDelimiter

                            lngChrPos = strLine.IndexOf(strTextQualifier, lngChrPos + 1)
                        Loop

                        'get field from line..
                        If lngChrPos = -1 Then
                            strField = strLine.Substring(1)
                            strLine = vbNullString
                        Else
                            strField = strLine.Substring(1, lngChrPos - 1)
                            If (strLine.Length() - 1) = lngChrPos Then
                                strLine = vbNullString
                            Else
                                strLine = strLine.Substring(lngChrPos + 2)
                                If strLine = "" Then
                                    bEmptyLast = True
                                End If
                            End If

                            'strField = String.Format("{0}{1}{2}", strTextQualifier, strField, strTextQualifier)
                        End If

                    Else
                        'find next delimiter..
                        'lngChrPos = InStr(1, strLine, strDataDelimiter)
                        lngChrPos = strLine.IndexOf(strDataDelimiter)

                        'get field from line..
                        If lngChrPos = -1 Then
                            strField = strLine
                            strLine = vbNullString
                        Else
                            strField = strLine.Substring(0, lngChrPos)
                            strLine = strLine.Substring(lngChrPos + 1)
                            If strLine = "" Then
                                bEmptyLast = True
                            End If
                        End If
                    End If

                    ' Now replace double qualifiers with a single qualifier in the "corrected" string
                    strField = strField.Replace(New String(strTextQualifier, 2), strTextQualifier)

                    'restore double qualifiers..
                    'strField = IIf(strField = vbNullChar, vbNullString, strField)
                    'strField = Replace$(strField, vbTab, strTextQualifier)
                    'strField = IIf(strField = vbTab, vbNullString, strField)
                    'strField = strField.Replace(vbTab, strTextQualifier)

                    'save field to array..
                    strNewLine = String.Format("{0}{1}{2}", strNewLine, strQualifierSplitter, strField)

                Loop

                If bEmptyLast = True Then
                    strNewLine = String.Format("{0}{1}", strNewLine, strQualifierSplitter)
                End If

                'trim off first nullchar..
                strNewLine = strNewLine.Substring(1)

                'split new line..
                strList = strNewLine.Split(strQualifierSplitter)
            Else
                If strLine.Substring(strLine.Length - 1, 1) = strDataDelimiter Then
                    strLine = strLine.Substring(0)
                End If
                'no qualifier.. do a simply split..
                strList = strLine.Split(strDataDelimiter)
            End If

            'return result..
            Return strList

        Catch ex As Exception
            Throw New Exception(String.Format("Error Splitting Special String - {0}", ex.Message.ToString()))
        End Try
    End Function


If you wanted to display an underscore to indicate a space as before the ", like you show in your question you can use:

string[] splitString = t.Replace(" \"", "_\"").Split('"');


Regular expressions for this sort of thing get complicated fast as you start to add all sorts of exceptions.

None the less, if more for the sake of interest and completeness than anything else:

(?<term>[a-zA-Z0-9'.=]+)|("(?<term>[^"]+)")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜