开发者

Oracle 'pseduo-fact' view

Assumptions:

  1. I have a number of tables comprised of facts and foreign keys ('dimensional' and 'key-value' type). For example, ENCOUNTER:

ID - primary key

dimensions

  • LOCATION_ID
  • PATIENT_ID

key-value

  • TYPE_ID
  • STATUS_ID
  • PATIENT_CLASS_ID
  • DISPOSITION_ID
  • ...

facts

  • ADMISSION_DATE
  • DISCHARGE_DATE
  • ...

    1. I don't have the option to create a data warehouse
    2. I would like to simplify the data structure for reporting

My approach is to create a number of pseudo-dimensional views ('D_LOCATION' based on the DEPARTMENT and LOCATION tables) and pseudo-fact views ('F_ENCOUNTER' based on ENCOUNTER table). In the pseudo-fact view, I would JOIN the key-value tables (e.g. STATUS, PATIENT_CLASS) to the fact table to include the name fields (e.g. STATUS.NAME, PATIENT_CLASS.NAME).

Questions:

  • If a query selects a subset of all of the fields from F_ENCOUNTER (i.e. not all of the key-value.name fields), is the Oracle 10g optimizer smart enough to exclude some of the key-value table joins (i.e. the ones that aren't included in the query)?
  • Is there anything that I can do to optimize this architecture (other than indices)
  • Is there another approach?

** edit ** Goals (in开发者_JAVA技巧 order of importance):

  • reduce query complexity; increase query consistency; decrease report-development time
  • optimize query-processing
  • minimize administrator burden
  • decrease storage


One optimization suggestion is not to use key-value pair tables. The concept of a Dimension table is that each record should contain all information about that concept without needing to join to normalized tables - i.e. turning a star schema into a snowflake schema.

While values might be repeated across dimension table records, it has the advantage of fewer joins in your reporting queries. Denormalizing tables in this way might seem counter intuitive but where performance is paramount it is usually the best solution.


  • I don't believe Oracle would exclude any joins done in the view, because the joins can impact the number of rows returned. (As when an inner join fails to match any rows, making the whole result set empty.)
  • What are the goals of your optimization? Query speed? query simplicity? storage efficiency? If you can sacrifice storage efficiency for better query performance, then replace the key-value references with the values themselves (TYPE_NAME instead of TYPE_ID, PATIENT_CLASS_NAME instead of PATIENT_CLASS_ID, etc.).
  • [Edit:] If the original architecture cannot be modified, consider using a materialized view. It would essentially pre-compute the joins and store the result set, giving you speedy query time at the cost of extra storage space and possibly-not-fresh data. You can control the latter by specifying an appropriate refresh policy. See http://en.wikipedia.org/wiki/Materialized_view and http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/mv.htm for further details.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜