extracting strings from a file
Hi i have written a java program to get molecular function and biological process from a file if ID matches but im gettin StringIndexOutofBoundsException. can any one please correct it? Thanks in advance. Here is my input:
chr11 RAP3_rep mRNA 17114958 17117968 . + . ID=Os11t0448200-01;Name=Os11t0448200-01;Gene_symbols=AM14;GO=Molecular Function: protein kinase activity (GO:0004672),Molecular Function: ATP binding (GO:0005524),Biological Process: protein amino acid phosphorylation (GO:0006468),Molecular Function: protein tyrosine kinase activity (GO:0004713),Molecular Function: protein serine/threonine kinase activity (GO:0004674);ID_converter=Os11g0448200;InterPro=Protein kinase, core (IPR000719),Tyrosine protein kinase (IPR001245),Serine/threonine protein kinase (IPR002290),Serine/threonine protein kinase, active site (IPR008271),Protein kinase-like (IPR011009),Serine/threonine protein kinase-related (IPR017442);Link_to=8185 (Oryzabase),Protein kinase%2C core (Plant Gene Family Database);Locus_id=Os11g0448200;Note=Arbuscular mycorrhizal specific marker 14.;ORF_evidence=Q53JE9 (UniProt);Transcript_evidence=Inferred from reference;Sequence_download=Os11t0448200-01;References=19033527%2C 15905328;Status=manual curation (Oct 29%2C 2010)
chr11 RAP3_rep CDS 17114958 17115039 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17115846 17115869 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17115970 17116095 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17116205 17116546 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17116669 17116784 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17116880 17117140 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17117589 17117786 . + . Parent=Os11t0448200-01
chr11 RAP3_rep CDS 17117891 17117968 . + . Parent=Os11t0448200-01
chr11 RAP3_rep mRNA 17565866 17568694 . - . ID=Os11t0455500-01;Name=Os11t0455500-01;Alias=AK059712,AK060299,AK119539,AK122115;ID_converter=Os11g0455500;Link_to=S-adenosyl-L-homocysteine hydrolase (Plant Gene Family Database);Locus_id=Os11g0455500;NIAS_FLcDNA=001-032-F05;Note=Similar to Adenosylhomocysteinase-like protein.;ORF_evidence=Q84VE1 (UniProt);Transcript_evidence=AK059712 (DDBJ%2C Best hit);Sequence_download=Os11t0455500-01;InterPro=NAD(P)-binding (IPR016040),S-adenosyl-L-homocysteine hydrolase (IPR000043),S-adenosyl-L-homocysteine hydrolase%2C NAD binding (IPR015878);GO=Molecular Function: catalytic activity (GO:0003824),Molecular Function: binding (GO:0005488),Biological Process: metabolic process (GO:0008152),Molecular Function: adenosylhomocysteinase activity (GO:0004013),Biological Process: one-carbon compound metabolic process (GO:0006730);Expression=AK059712
chr11 RAP3_rep CDS 17567891 17568694 . - . Parent=Os11t0455500-01;
chr11 RAP3_rep CDS 17566493 17567029 . - . Parent=Os11t0455500-01;
chr11 RAP3_rep CDS 17566191 17566400 . - . Parent=Os11t0455500-01;
and program
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.InputStreamReader;
import java.io.ObjectInputStream.GetField;
import java.util.ArrayList;
import java.util.Scanner;
public class Sample
{
public static void main(String args[]) throws FileNotFoundException
{
Sample s=new Sample();
String inputID="Os11t0120200-01";
//System.out.println("Enter the value");
//Scanner sc=new Scanner(System.in);
//n=sc.nextLong();
ArrayList<String> IDlist=new ArrayList<String>();
ArrayList<String> InputIDlist=new ArrayList<String>();
int n;
try
{
File nf=new File("textfile1.txt");
FileOutputStream fop1=new FileOutputStream(nf,true);
String os ="";
FileInputStream fis1=new FileInputStream("chr11.gb");
FileInputStream fis2=new FileInputStream("1.txt");
InputStreamReader in1 = new InputStreamReader(fis1, "UTF-8");
InputStreamReader in2 = new InputStreamReader(fis2, "UTF-8");
BufferedReader input1 = new BufferedReader(in1);
BufferedReader input2 = new BufferedReader(in2);
String line1;
String line2;
FileInputStream fis=new FileInputStream("chr11.GB");
InputStreamReader in = new InputStreamReader(fis, "UTF-8");
BufferedReader input = new BufferedReader(in);
String line;
File f=new File("1.GB");
FileOutputStream fop=new FileOutputStream(f);
if(f.exists())
{
os="This data is written through the program\t\n";
fop1.write(os.getBytes());
String str1="";
String str2="";
os="The data has been written\t\n";
fop1.write(os.getBytes());
while((line=input.readLine())!=null)
{
String splits[]=line.split("\t");
if(splits[2].equalsIgnoreCase("mrna"))
{
IDlist.add((splits[8]));
}
}
while((line=input2.readLine())!=null)
{
String splits[]=line.split("\t");
if(splits[0]!="")
{
InputIDlist.add((splits[0]));
}
}
for(int j=0; j<InputIDlist.size(); j++)
{
for(int i=0; i<IDlist.size(); i++)
{
if((IDlist.get(i).substring(3, 18).toString()).equals(InputIDlist.get(j)))
{
if(IDlist.get(i).contains("Alias"))
{
os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Alias"),IDlist.get(i).lastIndexOf("ID_converter"))+"\t\n";
fop1.write(os.getBytes());
}
if(IDlist.get(i).contains("Biological Process"))
{
//n=IDlist.get(i).lastIndexOf("Biological Process");
os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Biological Process"),IDlist.get(i).lastIndexOf(";"))+"\t\n";
fop1.write(os.getBytes());
}
if(IDlist.get(i).contains("Molecular Function"))
{
//n=IDlist.get(i).lastIndexOf("Molecular Function");
os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Molecular Function"), IDlist.get(i).lastIndexOf(","))+"\t\n";
fop1.write(os.getBytes());
}
break;开发者_运维百科
}
String p="\n";
fop1.write(p.getBytes());
}
}
}
else
{
System.out.println("This file is not exist");
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
I agree with the comments on the question, but I'll still try a guess:
Most likely, it is the following file (due to StringIndexOutOfBoundsException): IDlist.get(i).substring(3, 18)
. If this is shorter, you'd get that exception.
A reason for this might be this part:
if(splits[0]!="")
{
InputIDlist.add((splits[0]));
}
If splits[0]
is empty, ==
might still not be true (and thus !=
might be true). Use !splits[0].equals("")
here (or better !"".equals(splits[0])
to account for the possibility that splits[0]
might ever be null). Note that ==
checks for reference equality, i.e. do both references point to the same object (in terms of C++, is it the same pointer), whereas equals
checks for logical equality (might be differently implemented for each object).
Edit:
Another possibility for that exception would be one of those lines:
os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Alias"),IDlist.get(i).lastIndexOf("ID_converter"))
You check for "Alias", so lastIndexOf("Alias")
should not return -1, but IDlist.get(i).lastIndexOf("ID_converter")
might. If so, you are out of bounds.
Edit 2:
Yet another thing: Even if both Strings ("Alias" and "ID_converter") are in the source string, but in the wrong order ("ID_converter .... Alias"), you'd get that exception as well, since then begin index > end index
which is not allowed (please read the JavaDoc on String.substring()).
Change:
if (IDlist.get(i).contains("Alias"))
To:
if ((IDlist.get(i).contains("Alias")) && (IDlist.get(i).contains("ID_converter")))
Any set a breakpoint to check why is the second condition is false if it doesnt go in to the if statement then.
精彩评论