Generate SQL JOIN statements

2023-02-26 02:44 问答作者：

Background

The following data is a specific example of a general problem (sorted by column name):

Primary Key: as_comp.companynamecurrent.companyid
Primary Key: as_comp.companylocation.companyid
Primary Key: as_comp.companylocation.locationid
Primary Key: as_hr.personemploymentcurrent.locationid
Primary Key: as_hr.personnamecurrent.personid
Primary Key: as_hr.personemploymentcurrent.personid
Primary Key: as_hr.personaddresscurrent.personid
Primary Key: as_hr.personemploymentcurrent.positionid

In actuality, the number of primary keys and their relationships is dynamic (unknown at run-time). The code that produces the listing of unique primary keys resembles the following:

// Remove duplicate columns by using a set.
//
Set<String> joins = new HashSet<String>( bundles.size() );

for( Bundle b : bundles.keySet() ) {
  joins.add( b.getColumn() );
}

// Match the primary keys in the bundles.
//
for( Bundle b : bundles.keySet() ) {
  if( joins.contains( b.getColumn() ) ) {
    System.out.println( "Primary Key: " + b.toString() );
  }
}

Each Bundle (as in "Resource Bundle" item for internationalization) instance contains the schema, table (or view), column name, and other information used to uniquely identify a database column and facilitate translation. Each Bundle can answer whether it is a primary key constraint via the isJoin() method:

boolean bundle.isJoin()

Problem

Given the data in the previous section, create a set of JOIN clauses as follows:

join as_comp_companylocation as_comp_companylocation on (as_hr_personemploymentcurrent.locationid == as_comp_companylocation.locationid)
join as_hr_personnamecurrent as_hr_personnamecurrent on (as_hr_pers开发者_开发知识库onemploymentcurrent.personid == as_hr_personnamecurrent.personid)
join as_hr_personaddresscurrent as_hr_personaddresscurrent on (as_hr_personemploymentcurrent.personid == as_hr_personaddresscurrent.personid)
join as_comp_companynamecurrent as_comp_companynamecurrent on (as_comp_companylocation.companyid == as_comp_companynamecurrent.companyid)

Note that positionid can be dropped as it is not needed for joining.

Updates

The left-side comparison condition operand are the primary keys whose corresponding table (or view) name is listed twice (or more). This yields:

companylocation
personemploymentcurrent

Question

How would you create the JOIN statements using the given data set?

Many thanks.

This need not be that hard.

What you basically have is a graph of tables with the edges describing how the tables are joined.

Then when you have the list of tables to be joined, you start and one and start searching the graph. As you walk the graph, keep track of the join criteria, and the visited nodes.

I would exhaustively search the graph, which could well result in several paths that link all the tables up. Simply choose the shortest path.

Finally, use the table names as a key and cache the results, then you will only have to search the graph for each unique set of tables. No real magic here I don't think, and you would have to have a LOT of tables before this would probably take any really noticeable time. It also give you the benefit of being able to select two tables that are not directly joined, the path will find any intermediate table necessary to make the connection.

If it's as static as you described, I'd prepare the queries statically, and look them up depending in the clusters selected. I.e. no algorithm involved.

Too simple? KISS :)

The algorithm turned out to be fairly in-depth. Essentially:

Obtain the list of column mappings.
If a column is a primary key, put it into a map based on its uniquely identifiable name.
Remove duplicate primary keys.
Sort the primary keys.
Create a new map to count (tally) the number of times a primary key appears.
Create a "duplicate list" to store the primary key columns that occur more than once.
Iterate over the sorted primary keys.
Add every primary key that occurs more than once to the "duplicate list".
Iterate over the duplicate primary keys.
Iterate over all the join columns.
If the column equals one of the duplicates, add it to a list of comparator columns.
Create a new string to hold the SQL "join" clause text.
Iterate over the comparator columns (primary operand) and the sorted columns (secondary operand).
If the primary operand and the secondary operand have the same column name, but are from different tables, then build the "join" clause based on the operands.
Store the join clause in a list.

Improvements are welcome.

继续阅读：algorithm join primary-key sql

Generate SQL JOIN statements

Background

Problem

Updates

Question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Background

Problem

Updates

Question

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？