How would I split a large set of tabular data into smaller relevant tables? (Not a DB Question)

2023-03-05 01:17 问答作者：

I'm really hoping I can describe this question in an understandabl开发者_开发技巧e way. This is a puzzle that I have not been able to begin to solve even though I (mostly) understand it. I'm just not sure where to start, and I'm really hoping someone out there can get me headed in the right direction.

I have a LARGE table of data. It describes relationships between objects. Let's say the Y-axis has items numbered 1-1000, and the X-axis has items 1-1000 also. If item #234 on the Y-axis is related to item #791 on X, there will be a mark in the table where the row and column cross. In some industries this is referred to an a Truth Table. One can, at a glance, see how many items in a system relate to each other. The marks in the table can help to identify trends and patterns.

Here's some other helpful stuff about the nature of the table:

The full range of the number of relationships (r) for each item on either axis can be 1 <= r <= axisTotal.
The X and Y axis will share common items, but each axis will also have items that the other axis does not.
Each item will only exist once per axis. It can be on X and Y, but it would only be on each one 1 time.
The total number of items on each axis will most likely NOT be equal. Each axis could have from 50 to 1000's of items.

The end result is that this is going to be a report that needs to be printed. We have successfully printed a table that had about 100-150 items on each axis on an 11in X 17in piece of paper. Any more than that and it begins to be so small it's unreadable.

What I am trying to do is split the super large tables into smaller tables, but related points need to stay together. If I grab item 1-100 on X then I would need each item they relate to from Y.

I've generated a number of these tables and, while the number of relationships CAN be arbitrary, I have never seen an item relate to all other items. So in real practice the range is more like 1 <= r <= (10% * axisTotal). If an item's relationships exceed this range, it can be split up into multiple tables, but that is not optimal at all.

At the end of the day I think we, and our clients, would be happy if a 1000x1000 item table was split into 8 to 10 printed pages of smaller, related tables.

Any guidance would be a great help! Thanks.

---EDIT--- One other thing worth noting, there will be no empty rows or columns in the table. Every item on both the x and y axis will relate to at least 1 item on the opposite axis.

---EDIT--- Here is an example of a small truth table that I'm describing:

How would I split a large set of tabular data into smaller relevant tables? (Not a DB Question)

. Every row and column has at least one relationship.

---EDIT--- May 18th, 2011 For what it's worth, I was moving pretty good on this project and I got pulled off for a couple of weeks. So it's going to a little while before I get back to this problem. But it is one that I will have to solve soon.

---EDIT--- July 11th, 2011 Bummer. Well, looks like I'm not going to be able to solve this problem right now. I was really hoping to be able to figure this out. Through discussion we decided to present the truth table in an Excel spreadsheet as an add-on resource to the main report. Excel 2007 and later will handle 1000's of columns which will more than suffice. Plus, we added some VBA which allows the viewer to double click on the column titles. This action will reduce the rows to only ones where there are interactions. Then it removes empty columns. In this way they can see a small sub-table based on the item they want to view, and can print it if they want.

This isn't an answer, I just want to try to visualize your data a little better. Does it kind of look like this?

        Alice  Bob  Charlie ... Zelda
Shoes     X            X
Hats            X                 X
Gloves                 X
...
Pants           X

EDIT

Is it a requirement to show the data in tabular format? Or could you just list each out? Something like:

Alice
- Shoes
Bob
- Hats
- Pants
Charlie
- Shoes
- Gloves
Zelda
- Hats

Or the other way:

Shoes
- Alice
- Charlie
Hats
- Bob
- Zelda
Gloves
- Charlie
Pants
- Bob

EDIT 2

Okay, I've made another larger truth table to hopefully get a better understanding of how you want to split things up:

   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
 1 x   x     x                             x
 2   x x     x             x         x     x
 3 x               x             x             x
 4         x             x     x
 5   x           x                 x
 6               x             x           x
 7   x             x             x
 8         x               x               x

For argument's sake lets just say that you can only fit 4 rows on a page (because I don't feel like typing out a giant table this early in the morning) so we're going to split this into two pages. First, it is important to show every row, right? Second, do you need to show columns that never have a value. For instance Y and Z never have a value for rows 1 through 8 in this table, can they be excluded from the report or do they still need to be there? Third, is order of the rows important?

If its not important to show completely empty columns then we could remove 10 columns from the table above and compress it down to:

   A B C E F H I L M O P Q R U V W
 1 x   x   x                 x
 2   x x   x       x       x x
 3 x           x       x         x
 4       x       x   x
 5   x       x           x
 6           x       x       x
 7   x         x       x
 8       x         x         x

Then if row order isn't important you can compress it further by taking an optimum row arrangement (not necessarily shown here). The two tables below have further been compress to 11 and 10 columns:

  A B C F H I M P Q R U
1 x   x x             x
2   x x x     x     x x
5   x     x       x
7   x       x   x

  A E H I L M O P U W
3 x     x       x   x
4   x     x   x
6     x       x   x
8   x       x     x

Am I going down a completely wrong path here? These are all just questions to help me better understand your data and output requirements.

Also, in all seriousness, is it an option to get larger printers/plotters? Also, is it an option to just generate a PDF and use Acrobat's print tile's option?

Last year I read an article at the Computational Biology PLoS journal (www.ploscompbiol.org), that seems related to your problem.

In short, it describes a new approach when we already have a set of proteins and tabular data about their one-to-one interaction and we want to to group them so that interaction inside a group and interaction between two groups is either maximized or (this is the innovative idea) minimized .

If we plot the start data table with black for high and white for low interaction it looks randomly gray. The result table, after the calculations and rearranging is done (so grouped items are placed near one another), looks more like orthogonal areas of black and white.

The article: Protein Interaction Networks—More Than Mere Modules,

where there are also references to other older techniques for grouping this kind of data.

继续阅读：algorithm math scripting

How would I split a large set of tabular data into smaller relevant tables? (Not a DB Question)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？