开发者

How to join two tables on three conditions?

hope someone can figure out what I'm doing wrong here. The task seems quite simple, but is apparently beyond me.

I have two tables and I am trying to append precipitation data. Three conditions need to match in the two tables to get the correct precipitation data for each field site, e.g. year, latitude, and longitude. I tried by using the following queries (among other failed queries that I can't even remember any more):

SELECT f.*, g.* 
  FROM fieldSites f  LEFT OUTER JOIN gpcp_precipitation2 g
    ON f.date = g.year 
   AND f.d_lat = g.lat
 WHERE f.d_lon = g.lon; 

this one timed out

and:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

this one also timed out on me.

I'd like to dump to a .csv file, but right now I just want to execute a successful query.

Here are my tables:

left table: fieldSites

siteId  d_lat  d_lon   year  data1  data2  country      
  1     -13.75  18.75   2009  0.598  0.351  Angola       
  1     -13.75  18.75   2008  0.654  0.330  Angola       
  1     -13.75  18.75   2007  0.489  0.381  Angola       
  1     -13.75  18.75   2006  0.554  0.389  Angola       
  1     -13.75  18.75   2005  0.321  0.321  Angola       
  1     -13.75  18.75   2004  0.598  0.351  Angola       
  1     -13.75  18.75   2003  0.654  0.330  Angola       
  1     -13.75  18.75   2002  0.489  0.381  Angola       
  1     -13.75  18.75   2001  0.554  0.389  Angola开发者_运维技巧       
  2     -78.75  163.75  2009  0.285  0.155  Antarctica   
  2     -78.75  163.75  2008  0.285  0.155  Antarctica   
  2     -78.75  163.75  2007  0.285  0.155  Antarctica   
  2     -78.75  163.75  2006  0.285  0.155  Antarctica   
  2     -78.75  163.75  2005  0.285  0.155  Antarctica   
...1052 sites, 11 years, 11496 rows

right table: gpcp_precipitation2

siteId   lat    lon   year  precipitation
1        81.5   1.25  2009  93.36571912   
1        81.5   1.25  2008  93.36571912   
1        81.5   1.25  2007  93.36571912   
1        81.5   1.25  2006  93.36571912   
1        81.5   1.25  2005  93.36571912   
1        81.5   1.25  2004  93.36571912   
1        81.5   1.25  2003  93.36571912   
1        81.5   1.25  2002  93.36571912   
1        81.5   1.25  2001  93.36571912   
1        81.5   1.25  2000  93.36571912   
1        81.5   3.75  2009  93.36571912 
1        81.5   3.75  2008  93.36571912   
1        81.5   3.75  2007  93.36571912

... 92300 rows  

What I want is this:

siteId  d_lat  d_lon   year  data1  data2  country      precipitation  
  1     13.75  18.75   2009  0.598  0.351  Angola       144.286
  1     13.75  18.75   2008  0.654  0.330  Angola       114.970
  1     13.75  18.75   2007  0.489  0.381  Angola       70.000
  1     13.75  18.75   2006  0.554  0.389  Angola       174.179
  1     13.75  18.75   2005  0.321  0.321  Angola       174.743
  1     13.75  18.75   2004  0.598  0.351  Angola       70.506
  1     13.75  18.75   2003  0.654  0.330  Angola       173.716
  1     13.75  18.75   2002  0.489  0.381  Angola       74.162
  1     13.75  18.75   2001  0.554  0.389  Angola       139.445
  2     78.75  163.75  2009  0.285  0.155  Antarctica   0
  2     78.75  163.75  2008  0.285  0.155  Antarctica   0
  2     78.75  163.75  2007  0.285  0.155  Antarctica   0
  2     78.75  163.75  2006  0.285  0.155  Antarctica   0

Am I doing something completely stupid? I am stumped. Thanks so much for any advice.


Select fieldSites.*, precipitation.*
From fieldSites
Inner Join gpcp_precipitation2 As precipitation On precipitation.siteId = fieldSites.siteId
Where
    fieldSites.d_year = precipitation.year And
    fieldSites.d_lat = precipitation.lat And
    fieldSites.d_lon = precipitation.lon

If that query is timing out, you have an indexing problem, not necessarily a query problem. This gives you several predicates in the where clause to filter on, so it should reduce your joins quite a bit, but you may need an index that includes siteId, year, lat, and lon on both tables.


Add an index on (date, d_lat, d_lon) to the first table and a (year, lat, lon) index to the second table. Then, try the joins.

From your comments, I suggest you use the second query:

SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON  fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year ;

Can you also post the query plan for the above, now that you have added some indexes? (use EXPLAIN SELECT ... )


SELECT *
FROM fieldSites
INNER JOIN gpcp_precipitation2 
  ON (fieldSites.d_lon = gpcp_precipitation2.lon 
  AND fieldSites.d_lat = gpcp_precipitation2.lat 
  AND fieldSites.date = gpcp_precipitation2.year);

The last line:

AND fieldSites.date = gpcp_precipitation2.year);

According to your tables this should be fieldSites.year Is that a typo or the error?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜