开发者

How to resolve this MySQL query?

I have a table that looks like this:

    CREATE TEMPORARY TABLE MainList (
  `pTime` int(10) unsigned NOT NULL,
  `STD` double NOT NULL,
  PRIMARY KEY (`pTime`)
) ENGINE=MEMORY;


+------------+-------------+
| pTime      | STD         |
+------------+-------------+
| 1106080500 |  -0.5058072 |
| 1106081100 | -0.82790455 |
| 1106081400 | -0.59226294 |
| 1106081700 | -0.99998194 |
| 1106540100 | -0.86649279 |
| 1107194700 |  1.51340543 |
| 1107305700 |  0.96225296 |
| 1107306300 |  0.53937716 |
+------------+-------------+ .. etc

pTime is my primary key.

I want to make a query that, for every row in my table, will find the first pTime where STD has a flipped sign and is further away from 0 than STD of the above table. (For simplicity's sake, just imagine that I am looking for 0-STD)

Here is an example of the output I want:

+------------+-------------+------------+-------------+
| pTime      | STD         | pTime_Oppo | STD_Oppo    |
+------------+-------------+------------+-------------+
| 1106080500 |  -0.5058072 | 1106090400 |  0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 |  0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 |  0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 |  1.0660959  |
+------------+-------------+------------+-------------+

I can't seem to get it right! I tried the following:

SELECT DISTINCT
    MainList.pTime,
    MainList.STD,
    b34d1.pTime,
    b34d1.STD
FROM
    MainList
JOIN b34d1 ON(
    b34d1.pTime > MainList.pTime
    AND(
        (
            MainList.STD > 0
            AND b34d1.STD <= 0 - MainList.STD
        )
        OR(
            MainList.STD < 0
            AND b34d1.STD >= 0 - MainList.STD
        )
    )
);

That code just freezes my server up.

P.S Table b34d1 is just like MainList, except it contains much more elements:

mysql>  select STD, Slope from b31d1 limit 10;
+-------------+--------------+
| STD         | Slope        |
+-------------+--------------+
| -0.44922675 |   -5.2016129 |
| -0.11892021 |  -8.15249267 |
|  0.62574686 | -10.19794721 |
|  1.10469057 | -12.43768328 |
|  1.52917352 | -13.08651026 |
|  1.61803899 |  -13.2441349 |
|  1.82686555 | -12.04912023 |
|  2.07480736 | -11.22067449 |
|  2.45529961 |  -7.84090909 |
|  1.86468335 |  -6.26466276 |
+-------------+--------------+
mysql>  select count(*) from b31d1;
+----------+
| count(*) |
+----------+
|   439340 |
+----------+

1 row in set (0.00 sec)

In fact MainList is just a filtered version of b34d1 that uses the MEMORY engine

mysql> show create table b34d1;
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| Table | Create Table
                                                                                                       |
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+
| b34d1 | CREATE TABLE `b34d1` (
  `pTime` int(10) unsigned NOT NULL,
  `Slope` double NOT NULL,
  `STD` double NOT NULL,
  PRIMARY KEY (`pTime`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED |
+-------+-----------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------+

Edit: I just did a little experiment and I am very confused by the results:

SELECT DISTINCT
    b34d1.pTime,
    b34d1.STD,
    Anti.p开发者_如何学CTime,
    Anti.STD

FROM
    b34d1

LEFT JOIN b34d1 As Anti ON(
    Anti.pTime > b34d1.pTime
    AND(
        (
            b34d1.STD > 0
            AND b34d1.STD <= 0 - Anti.STD
        )
        OR(
            b34d1.STD < 0
            AND b34d1.STD >= 0 - Anti.STD
        )
    )
)  limit 10;

+------------+-------------+------------+------------+
| pTime      | STD         | pTime      | STD        |
+------------+-------------+------------+------------+
| 1104537600 | -0.70381962 | 1104539100 | 0.73473692 |
| 1104537600 | -0.70381962 | 1104714000 | 1.46733274 |
| 1104537600 | -0.70381962 | 1104714300 | 2.02097356 |
| 1104537600 | -0.70381962 | 1104714600 | 2.60642099 |
| 1104537600 | -0.70381962 | 1104714900 | 2.01006557 |
| 1104537600 | -0.70381962 | 1104715200 | 1.97724189 |
| 1104537600 | -0.70381962 | 1104715500 | 1.85683704 |
| 1104537600 | -0.70381962 | 1104715800 |  1.2754127 |
| 1104537600 | -0.70381962 | 1104716100 | 0.87900156 |
| 1104537600 | -0.70381962 | 1104716400 | 0.72957739 |
+------------+-------------+------------+------------+

Why are all the values under the first pTime the same?


Selecting other fields from a row having some aggregate statistic (such as a minimum or maximum value) is a little messy in SQL. Such queries aren't so simple. You typically need an extra join or a subquery. For example:

SELECT m.pTime, m.STD, m2.pTime AS pTime_Oppo, m2.STD AS STD_Oppo
  FROM MainList AS m
    JOIN 
      (SELECT m1.pTime, MIN(m2.pTime) AS pTime_Oppo
         FROM MainList AS m1
           JOIN MainList AS m2 
             ON m1.pTime < m2.pTime AND SIGN(m1.STD) != SIGN(m2.STD)
         WHERE ABS(m1.STD) <= ABS(m2.std)
         GROUP BY m1.pTime
      ) AS oppo ON m.pTime = oppo.pTime
    JOIN MainList AS m2 ON oppo.pTime_Oppo = m2.pTime
;

Using the sample data:

INSERT INTO MainList (`pTime`, `STD`)
  VALUES
(1106080500, -0.5058072),
(1106081100, -0.82790455),
(1106081400, -0.59226294),
(1106081700, -0.99998194),
(1106090400,  0.57510881),
(1106091300,  0.85599817),
(1106091600,  1.0660959),
(1106540100, -0.86649279),
(1107194700,  1.51340543),
(1107305700,  0.96225296),
(1107306300,  0.53937716),
;

The results are:

+------------+-------------+------------+-------------+
| pTime      | STD         | pTime_Oppo | STD_Oppo    |
+------------+-------------+------------+-------------+
| 1106080500 |  -0.5058072 | 1106090400 |  0.57510881 |
| 1106081100 | -0.82790455 | 1106091300 |  0.85599817 |
| 1106081400 | -0.59226294 | 1106091300 |  0.85599817 |
| 1106081700 | -0.99998194 | 1106091600 |   1.0660959 |
| 1106090400 |  0.57510881 | 1106540100 | -0.86649279 |
| 1106091300 |  0.85599817 | 1106540100 | -0.86649279 |
| 1106540100 | -0.86649279 | 1107194700 |  1.51340543 |
+------------+-------------+------------+-------------+


Any solution based on functions like ABS or SIGN or anything similar required to check sign is doomed to be ineffective on big sets of data, because it makes indexing impossible.

You are creating a temporary table inside a SP so you can alter it schema without losing anything, adding a column that stores sign of STD and storing STD itself unsigned will give you HUGE performance boost, because you can simply find first bigger pTime and bigger STD with a different sign and all conditions can use indices in a query like this (STD_positive keeps STD's sign):

SELECT * from mainlist m
LEFT JOIN mainlist mu 
ON mu.pTime = ( SELECT md.pTime FROM mainlist md 
            WHERE m.pTime < md.pTime
            AND m.STD < md.STD
            AND m.STD_positive <> md.STD_positive
            ORDER BY md.pTime
            LIMIT 1 ) 

LEFT JOIN is needed here to return rows that dont have bigger STD. If you don't need them use simple JOIN. This query should run fine even on lots of records, with proper indices based on careful checking of EXPLAIN output, starting with an index on STD.


SELECT
  m.pTime,
  m.STD,
  mo.pTime AS pTime_Oppo,
  -mo.STD AS STD_Oppo
FROM MainList m
  INNER JOIN (
    SELECT
      pTime,
      -STD AS STD
    FROM MainList
  ) mo ON m.STD > 0 AND mo.STD > m.STD
       OR m.STD < 0 AND mo.STD < m.STD
  LEFT JOIN (
    SELECT
      pTime,
      -STD AS STD
    FROM MainList
  ) mo2 ON mo.STD > 0 AND mo2.STD > m.STD AND mo.STD > mo2.STD
        OR mo.STD < 0 AND mo2.STD < m.STD AND mo.STD < mo2.STD
WHERE mo2.pTime IS NULL
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜