Optimize Oracle SQL with large 'IN' clause
Here I have a query like below:
SELECT field
FROM table
WHERE value IN ('val1', 'val2', 'val3', ... 'valn')
Let's say there are 2000 values inside the IN clause, the value doesn't exist in other table. Do you have any idea to speed up this operation?
The qu开发者_开发问答estion is open to accept any kind of methods..
Thanks!
Create an index that covers 'field' and 'value'.
Place those IN values in a temp table and join on it.
SELECT field
FROM table
WHERE value IN SELECT somevalue from sometable
As far as i know, you will face another problem. That will be the limitation of 'IN' clause. Using this, you can avoid that and hopefully fasten your query
You can join a normal table with a memory table that is filled with the list of values.
I don't how to do that with Java exactly but I do know how to do this with C#. I think something similar should be possible with Java.
Read here: http://forums.oracle.com/forums/thread.jspa?threadID=892457&tstart=375
Let's use a collection of User Defined Types (UDT's). First create a table with 1 million rows:
create table employees (id number(10) not null primary key, name varchar2(100) );
insert into employees
select level l, 'MyName'||to_char(level)
from dual connect by level <= 1e6;
1000000 rows created
commit;
exec dbms_stats.gather_schema_stats(USER, cascade=>TRUE);
No we turn to the C# code:
Let's select employees with id's 3 and 4.
Collection type MDSYS.SDO_ELEM_INFO_ARRAY is used because if we use this already predefined Oracle type we don't have to define our own Oracle type. You can fill collection MDSYS.SDO_ELEM_INFO_ARRAY with max 1048576 numbers.
using Oracle.DataAccess.Client;
using Oracle.DataAccess.Types;
[OracleCustomTypeMappingAttribute("MDSYS.SDO_ELEM_INFO_ARRAY")]
public class NumberArrayFactory : IOracleArrayTypeFactory
{
public Array CreateArray(int numElems)
{
return new Decimal[numElems];
}
public Array CreateStatusArray(int numElems)
{
return null;
}
}
private void Test()
{
OracleConnectionStringBuilder b = new OracleConnectionStringBuilder();
b.UserID = "sna";
b.Password = "sna";
b.DataSource = "ora11";
using (OracleConnection conn = new OracleConnection(b.ToString()))
{
conn.Open();
using (OracleCommand comm = conn.CreateCommand())
{
comm.CommandText =
@" select /*+ cardinality(tab 10) */ * " +
@" from employees, table(:1) tab " +
@" where employees.id = tab.column_value";
OracleParameter p = new OracleParameter();
p.OracleDbType = OracleDbType.Array;
p.Direction = ParameterDirection.Input;
p.UdtTypeName = "MDSYS.SDO_ELEM_INFO_ARRAY";
p.Value = new Decimal[] { 3, 4 };
comm.Parameters.Add(p);
int numPersons = 0;
using (OracleDataReader reader = comm.ExecuteReader())
{
while (reader.Read())
{
MessageBox.Show("Name " + reader[1].ToString());
numPersons++;
}
}
conn.Close();
}
}
}
The index on employees.id isn't used when one omits hint /*+ cardinality(tab 10) */. This index is created by Oracle because id is the primary key column.
This means that you don't have to fill a temporary table. The list of vaues stays in ram and you join your table employees with this list of values in memory table(:1) tab.
(wateenmooiedag=TTT)
This looks like the right way in Java: http://knol.google.com/k/oracle-passing-a-list-as-bind-variable#
It is similar to the C# solution. Your list of value stays in memory (no temporary table) and it will not be persisted to disk and you use a parameterized query so the query executer doesn't have to reparse every query. I have no tried it with java but I think that it will be fast.
If you already have an index on the value field and the values are not available in any tables to join or sub-select from, then I don't think there are any possibilities to optimize. In the special case where your values are really "val1", "val", ... you could use a like query which would uses an index to search on the prefix. But I asume that was just an example.
Oracle query parsing and caching mechanism works better when you use bind variables. If your query may perform better by using them
SELECT field FROM table WHERE value IN (?,? ....) and Then assign values as needed.
It is much better to analyse performance of real execution using some tool like enterprise management console and decide of improvements. Creating index is probably the first step to do.
Storing potential values in another table and using suggestions from J Horstmann seems a correct idea. Please give it a try.
More information is required for selecting better solution.
- Does query execute frequently?
- Are values val1, val2 fixed?
- How big is the table?
If query is executed frequently, values val1, val2 etc are fixed and table is large (having say 20,000 or more rows) then store all the values in a another table (say temp table) and join both tables on value field.
If table in the query below is large, there should be index on value field to improve the performance.
SELECT field FROM table WHERE value IN ('val1', 'val2', 'val3', ... 'valn')
Both the table should be analyzed.
The reason for better performance is that the optimizer will choose best join method depending on the characteristic of the table. If table in above query is very large, the join will be nested loop join and above table should have an index on the column val.
If table in above query is very small (say fewer then 200-300 rows) the new table (temp table) should have index on val column.
If both tables are of almost same size indexes will not help much.
Conclusion: Best solution depends on specific situation.
If none of the other suggestions work, and the query takes a long time, you can try running it in parallel.
select /*+ parallel(table) */ field ...
Just rewrite you IN to exists. It will be faster.
I got acceptable performance (execution time close to fetching rows unconditionally) when doing a similar query with the following approach.
static final int MAX_QUERY_SET = 1000;
I iterate the values and make separate query every MAX_QUERY_SET
value. So for 10K values I have 10 queries. I process the queries sequentially.
After implementing this algorithm I was able to play with the constant. For either value of 30 or 3000 I got 3x longer execution time. So I sticked to 1000.
This may not work if you can't process multiple queries. My experience was gathered on a different database (Pervasive, with a limit of 65K characters per statement), but I think this question is quite general and its conclusions should be common.
精彩评论