开发者

Detecting tab space and next-lime markup symbols in text files

I need to parse a ra开发者_如何学编程w text file having a item for each line, and tab-delimited fields.

How can I detect a tab space and next-line markup symbols from a plain text document ? I was thinking to use Java APIs for it... but if you know any faster language and easy to use) for text parsing please let me know

thanks


String str = "Hello\tworld\nHello Universe";
System.out.println(str);
System.out.println(str.contains("\t"));
System.out.println(str.indexOf("\t"));
System.out.println(str.contains("\n"));
System.out.println(str.indexOf("\n"));

Output:

Hello        world
Hello Universe
true
5
true
11


You can try this

 try 
 {
     BufferedReader br = new BufferedReader(new FileReader(file1));
     String strLine = "";
      while (br.readLine() != null) 
      {
        strLine =br.readLine();
        Scanner str = new Scanner(strLine);
        str.useDelimiter("\t");
        while(str.hasNextToken)
        {
        }
      }
   } catch (Exception e)
   {
   } 


You can use the Guava librairy from Google
Have a look to the CharMatcher and Guava's slides

This is an exemple :

@Test
public void testGuavaMatcher(){

    String str = "Hello\tworld\nHello Universe";        

    CharMatcher tabMatcher = CharMatcher.is('\t');
    CharMatcher newLineMatcher = CharMatcher.is('\n');

    assertThat(tabMatcher.indexIn(str), is(5));
    assertThat(tabMatcher.matchesAnyOf(str), is(true));
    assertThat(newLineMatcher.indexIn(str), is(11));
    assertThat(newLineMatcher.matchesAnyOf(str), is(true));

    CharMatcher tabAndNewLineMatcher = tabMatcher.or(newLineMatcher);

    assertThat(tabAndNewLineMatcher.removeFrom(str), is("HelloworldHello Universe"));
}  

You can also have a look to the CharMatcher.BREAKING_WHITESPACE constant.


Text files do not have 'mark up' as such. Get each line using BufferedReader.readLine(). Tabs can be found by searching the lines for "\t".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜