Generate SQL JOIN statements
Background
The following data is a specific example of a general problem (sorted by column name):
Primary Key: as_comp.companynamecurrent.companyid
Primary Key: as_comp.companylocation.companyid
Primary Key: as_comp.companylocation.locationid
Primary Key: as_hr.personemploymentcurrent.locationid
Primary Key: as_hr.personnamecurrent.personid
Primary Key: as_hr.personemploymentcurrent.personid
Primary Key: as_hr.personaddresscurrent.personid
Primary Key: as_hr.personemploymentcurrent.positionid
In actuality, the number of primary keys and their relationships is dynamic (unknown at run-time). The code that produces the listing of unique primary keys resembles the following:
// Remove duplicate columns by using a set.
//
Set<String> joins = new HashSet<String>( bundles.size() );
for( Bundle b : bundles.keySet() ) {
joins.add( b.getColumn() );
}
// Match the primary keys in the bundles.
//
for( Bundle b : bundles.keySet() ) {
if( joins.contains( b.getColumn() ) ) {
System.out.println( "Primary Key: " + b.toString() );
}
}
Each Bundle
(as in "Resource Bundle" item for internationalization) instance contains the schema, table (or view), column name, and other information used to uniquely identify a database column and facilitate translation. Each Bundle
can answer whether it is a primary key constraint via the isJoin()
method:
boolean bundle.isJoin()
Problem
Given the data in the previous section, create a set of JOIN clauses as follows:
join as_comp_companylocation as_comp_companylocation on (as_hr_personemploymentcurrent.locationid == as_comp_companylocation.locationid)
join as_hr_personnamecurrent as_hr_personnamecurrent on (as_hr_pers开发者_开发知识库onemploymentcurrent.personid == as_hr_personnamecurrent.personid)
join as_hr_personaddresscurrent as_hr_personaddresscurrent on (as_hr_personemploymentcurrent.personid == as_hr_personaddresscurrent.personid)
join as_comp_companynamecurrent as_comp_companynamecurrent on (as_comp_companylocation.companyid == as_comp_companynamecurrent.companyid)
Note that positionid
can be dropped as it is not needed for joining.
Updates
The left-side comparison condition operand are the primary keys whose corresponding table (or view) name is listed twice (or more). This yields:
- companylocation
- personemploymentcurrent
Question
How would you create the JOIN statements using the given data set?
Many thanks.
This need not be that hard.
What you basically have is a graph of tables with the edges describing how the tables are joined.
Then when you have the list of tables to be joined, you start and one and start searching the graph. As you walk the graph, keep track of the join criteria, and the visited nodes.
I would exhaustively search the graph, which could well result in several paths that link all the tables up. Simply choose the shortest path.
Finally, use the table names as a key and cache the results, then you will only have to search the graph for each unique set of tables. No real magic here I don't think, and you would have to have a LOT of tables before this would probably take any really noticeable time. It also give you the benefit of being able to select two tables that are not directly joined, the path will find any intermediate table necessary to make the connection.
If it's as static as you described, I'd prepare the queries statically, and look them up depending in the clusters selected. I.e. no algorithm involved.
Too simple? KISS :)
The algorithm turned out to be fairly in-depth. Essentially:
- Obtain the list of column mappings.
- If a column is a primary key, put it into a map based on its uniquely identifiable name.
- Remove duplicate primary keys.
- Sort the primary keys.
- Create a new map to count (tally) the number of times a primary key appears.
- Create a "duplicate list" to store the primary key columns that occur more than once.
- Iterate over the sorted primary keys.
- Add every primary key that occurs more than once to the "duplicate list".
- Iterate over the duplicate primary keys.
- Iterate over all the join columns.
- If the column equals one of the duplicates, add it to a list of comparator columns.
- Create a new string to hold the SQL "join" clause text.
- Iterate over the comparator columns (primary operand) and the sorted columns (secondary operand).
- If the primary operand and the secondary operand have the same column name, but are from different tables, then build the "join" clause based on the operands.
- Store the join clause in a list.
Improvements are welcome.
精彩评论