开发者

Java: calculate linenumber from charwise position according to the number of "\n"

I know charwise positions of matches like 1 3 7 8. I need to know their corresponding line number.

Example: file.txt

Match: X

Mathes: 1 3 7 8.

Want: 1 2 4 4

$ cat file.txt
X2
X
4
56XX

[Added: does not notice many linewise matches, there is probably easier way to do it with stacks]

$ java testt     
1
2
4
$ cat testt.java 
import java.io.*;
import java.util.*;
public class testt {

    public static String data ="X2\nX\n4\n56XX";
    public static String[] ar = data.split("\n");

    public开发者_JS百科 static void main(String[] args){
        HashSet<Integer> hs = new HashSet<Integer>();
        Integer numb = 1;
        for(String s : ar){
            if(s.contains("X")){
                hs.add(numb);
                numb++;
            }else{
                numb++;
            }
        }   
        for (Integer i : hs){
            System.out.println(i);
        }
    }
}


To begin with, your example is invalid--the character X in your sample is found at positions (0,3,9,10), not (1,3,7,8). You're leaving the linefeed characters out of your reckoning, and you're starting the count at index 1 when you should start at zero.

The only way to relate absolute positions to line numbers is to map the positions of the line breaks for comparison. To do that on the fly, as others have said, is not difficult--just slow and tedious. If you're going to do multiple lookups, and you know the data won't change in between times, you should create a static map. You can use a List or a Map for that, but there's a class called SizeSequence that's ideal for the purpose. Check this out:

import javax.swing.SizeSequence;

public class Test
{
  static SizeSequence createLineMap(String s)
  {
    String[] lines = s.split("(?<=\n)");
    SizeSequence sseq = new SizeSequence(lines.length);
    for (int i = 0; i < lines.length; i++)
    {
      sseq.setSize(i, lines[i].length());
    }
    return sseq;
  }

  public static void main(String[] args) throws Exception
  {
    String input = "X2\nX\n4\n56XX";
    SizeSequence lineMap = createLineMap(input);
    String target = "X";
    int pos = -1;
    while ((pos = input.indexOf("X", pos+1)) != -1)
    {
      System.out.printf("'%s' found in line %d (index %d)%n",
          target, lineMap.getIndex(pos) + 1, pos);
    }
  }
}

output:

'X' found in line 1 (index 0)
'X' found in line 2 (index 3)
'X' found in line 4 (index 9)
'X' found in line 4 (index 10)

Notice how I split on the lookbehind (?<=\n) instead of just \n. That way I ensure that each line's character count includes the linefeed; all characters must be counted. (And on that note, I know there are issues with different line separators and surrogate pairs, but I'm leaving them out for clarity's sake.)

You could use the same technique on a file by substituting Scanner's findWithinHorizon() method for split() and 'indexOf()`.


public static String data ="X2\naaaaXXaaaa\naaaa\naaaaaaX\naaaaaaXaX";
public static String[] lines = data.split("\n");

public static void main(String[] args){
        Map<Integer, List<Integer>> result = new HashMap<Integer, List<Integer>>();

        Integer lineNum = 1;


        for(String s : lines){

            boolean keepSearching = true;
            List<Integer> charPositions=null;
            Integer charNum=0, lastCharNum=0;

            while(keepSearching){

                if (start == true){
                    charNum = s.indexOf("X", lastCharNum);
                    start = false;
                }else{
                    charNum = s.indexOf("X", lastCharNum+1); 
                }

                if(charNum >= 0){
                    if(charPositions== null){
                        charPositions = new ArrayList<Integer>();
                    }
                    charPositions.add(charNum);
                    lastCharNum = charNum;
                }else{
                    keepSearching = false;
                    if(charPositions!= null){
                        result.put(lineNum, charPositions);
                    }
                }
            }

            lineNum++;

        }   
        for (Integer i : result.keySet()){
            System.out.print("Line "+i+" : ");
            for(Integer j : result.get(i)){
                System.out.print("at char "+j+", "); //you should start for the end if you want to print in the right order !
            }
            System.out.println();
        }
    }

Output :
Line 1 : at char 0, 
Line 2 : at char 4, at char 5, 
Line 4 : at char 6, 
Line 5 : at char 6, at char 8,


increment your counter every time you read a line, instead of every time you read a character. If you're reading one character at a time, increment whenever you see an EOL character.


  1. Um... By reading the file line by line until you get a match and increasing a counter for each line you've seen?
  2. No.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜