C# DataTable (DataRowCollection) stored in a temporary file, not memory?
I would like to replace a DataTable with a custom class that implements DataRowCollection by storing the rows in a temporary data file instead of keeping them in memory.
I understand that this will be slow compared to in-memory tables, but I occasionally need to work with tables that simply will not fit in ram (> 4GB of data). I will discard the table and delete the temporary file at the end of the run.
The table data is coming from a database query. I know that I can change queries to reduce the size of the data set I get back. That is not the point. The point is there will always be some limit on memory and I would like to have the option of using a slow temporary file rather than just saying "you can't do that".
Is there a pre-written class or method of doing this? It seems like I am reinventing the wheel here...
Here is my skeletal start:
/// <summary>
/// like DataTable, but storing data in a file instead of memory
/// </summary>
public class FileBackedDataTable : DataTable, IIntegrationTest
{
new public FileBackedDataRowCollection Rows = null;
// Summary:
// Initializes a new instance of the System.Data.DataTable class with no arguments.
public FileBackedDataTable()
{
Rows = new FileBackedDataRowCollection(this);
}
}
/// <summary>
/// like a DataRowCollection but data is stored in a file, not in memory
/// </summary>
public class FileBackedDataRowCollection : ICollection, IEnumerable, IDisposable
{
/// <summary>
/// internally track each file record
/// </summary>
class recordInfo
{
public long recordPosition;
public int recordLength;
public int recordMaxLength;
public long hash;
}
DataTable table;
ArrayList rows = new ArrayList();
public FileBackedDataRowCollection(DataTable table)
{
this.table = table;
openBackingFile(table);
}
public int Count
{
get { return rows.Count; }
}
public void Clear()
{
rows.Clear();
truncateBackingFile();
}
public DataRow this[int index]
{
get
{
recordInfo info = (recordInfo)rows[index];
return readRow(info);
}
set
{
writeRow(index, value);
}
}
private void write开发者_Python百科Row(int index, DataRow value)
{
byte[] bytes = rowToBytes(value);
recordInfo info = (recordInfo)rows[index];
if (bytes.Length <= info.recordMaxLength)
{
info.recordLength = bytes.Length;
info.hash = value.GetHashCode();
writeBytes(info.recordPosition, bytes);
}
else
{
rows[index] = appendRow(bytes, value.GetHashCode());
}
}
private DataRow readRow(recordInfo recordInfo)
{
byte[] bytes = readBytes(recordInfo.recordPosition, recordInfo.recordLength);
DataRow row = bytesToRow(bytes);
return row;
}
public void Add(DataRow r)
{
byte[] bytes = rowToBytes(r);
recordInfo info = appendRow(bytes, r.GetHashCode());
rows.Add(info);
}
private recordInfo appendRow(byte[] bytes, long hash)
{
recordInfo info = new recordInfo();
info.recordLength = bytes.Length;
info.recordMaxLength = info.recordLength;
info.recordPosition = appendBytes(bytes);
info.hash = hash;
return info;
}
Recently, I've been looking at System.Data.SQLite to persist some application data instead of writing one myself.
How about create a temp file with SQLite and load your legacy data there? Then you can use it like a local file and delete after munching.
Almost 100% your plan is bad design. Spend some time on redesign, use your fellow DB instead of FILE they were kinda created to manipulate large chunks of data. IF needed you can write stored procedures in C# or other language if your db allows that.
describe the way you want to manipulate your data and you will get a real answer to your real problem. It will either require SQL query, or if it cant be done in SQL it can be done in some kind of a loop working with smaller data size almost for sure.
You can use DataTable.WriteXml. But I will support other people, it is better to limit the records you get from the database in the first place.
精彩评论