开发者

Best way to do a pivot table in SQLite?

I'm using C# and SQLite to slice large amounts of data, and I often need to display my data in pivot table form. I can easily make my pivots dynamic by using C# to create the SQL command from another query, but I still can't decide which way to do the pivoting itself, so I would like to hear some opinions on that matter from programmers more experienced than me..

I have three methods in mind. Lets say we have a simple table named tData with three columns: "row" represents the row number of that data,"col" represents the column number, and "val" represents the value.

The orthodox method is to use CASE expressions:

SELECT
      row,
      sum(CASE col WHEN 1 THEN val END) AS col1,
      sum(CASE col WHEN 2 THEN val END) AS col2,
      sum(CASE col WHEN 3 THEN val END) AS col3
FROM tData
GROUP BY row

However, I was thinking maybe it could be faster if I ditch the CASE statements and use a logical expression directly on the value, utilizing the fact that true==1 and false==0:

SELECT
      row,
      sum((col=1)*val) AS col1,
      sum((col=2)*val) AS col2,
      sum((col=3)*val) AS col3
FROM tData
GROUP BY row

I suspect this method should be faster, since the CASE expression should have some overhead, but I'm not really sure.

The third method is a bit more complex: it uses JOINs to do the pivoting:

SELECT
      rows.row,
      col1.valSum AS col1,
      col2.valSum AS col2,
      col3.valSum AS col3
FROM
    (SELECT row FROM tData GROUP BY row) AS rows
LEFT JOIN
    (SELECT row,sum(val) AS valSum FROM tData WHERE col=1 GROUP BY row) AS col1
    ON rows.row=col1.row
LEFT JOIN
    (SELECT row,sum(val) AS valSum FROM tData WHERE col=2 GROUP BY row) AS col2
    ON rows.row=col2.row
LEFT JOIN
    (SELECT row,sum(val) AS valSum FROM tData WHERE col=3 GROUP BY row) AS col3
    ON rows.row=col3.row

True, those JOINs have a serious overhead, but from my limited experience when dealing with large tables SQL implementations can do simple filter-group-and-sum operations much faster than custom-data-manipulation-on-each-row operations, and that more than makes up to that overhead. The problem is, those kind of SQL statements are more 开发者_JS百科complex to generate, since each column appears in two places in the statement - once in the fields clause and once in the FROM clause, instead of just in the fields clause like the first two methods. Plus I need to be careful with all those temp table's names.

So, any opinions?


I would expect the case statement approach to perform more quickly than doing as many groupbys-and-joins against your table as there are distinct values in the column-in-question. The former is CPU intensive, the latter is disk-intensive. E.g. if your column value that is to become the column-header contained a day of the week, you'd have seven pivot columns, and seven selects-groupbys. That could be expensive; it would depend on the size of the table.


It looks like you're using the EAV design, which makes it necessary to pivot rows into columns. In a proper relational database design, you would not use EAV. Columns would be columns, and you wouldn't need to pivot.

That said, I understand that EAV is sometimes the lesser of evils and it is a popular design when one needs to store an "extensible" set of attributes in the database.

The most efficient way to get the data back is to forget about doing the pivot in SQL. Just fetch your attributes as multiple rows per given value of row:

SELECT row, col, val FROM tData WHERE row = ...

Then write code in your C# application to loop over the resulting multi-row result set. Create a new object for each distinct row. Set the col field of the object to the value val. Then continue fetching the next row of the query result.

This has advantages:

  • The query is simple to write. Only three columns to name in the select-list, no need for column aliases.
  • The query is inexpensive for the RDBMS to execute. No GROUP BY, no self-joins, etc.
  • Still supports the extensible benefit of the EAV design. In fact, it's easier to extend, because when you add new logical columns to your data, you don't have to rewrite your SQL query.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜