Problems with query time when accessing to information stored in the computer memory
I am working on a simple program, simulating a small database. And I have a strange problem when reading information, stored in the computer memory. The thing is that the query time is much higher that what was supposed and I can't see why. Let me explain my problem in detail, and in the end, you will find my question in capital letters.
First of all I have a .txt file simulating a database table with random strings separated with "|". Here you have an example of table (with 5 rows and 5 columns).
Table.txt
0|42sKuG^uM|24465\lHXP|2996fQo\kN|293cvByiV
1|14772cjZ`SN|28704HxDYjzC|6869xXj\nIe|27530EymcTU
2|9041ByZM]I|24371fZKbNk|24085cLKeIW|16945TuuU\Nc
3|16542M[Uz\|13978qMdbyF|6271ait^h|13291_rBZS
4|4032aFqa|13967r^\\`T|27754k]dOTdh|24947]v_uzg
This information in a .txt file is read by my program and stored in the computer memory. Here you have the part of the code that read this information from a file and store in the computer.
Code that reads data from the Table.txt file and store it in the computer memory
string ruta_base("C:\\a\\Table.txt"); // Folder where my "Table.txt" is found
string temp; // Variable where every row from the Table.txt file will be firstly stored
vector<string> buffer; // Variable where every different row will be stored after separating the different elements by tokens.
vector<ElementSet>开发者_运维问答; RowsCols; // Variable with a class that I have created, that simulated a vector and every vector element is a row of my table
ifstream ifs(ruta_base.c_str());
while(getline( ifs, temp )) // We will read and store line per line until the end of the ".txt" file.
{
size_t tokenPosition = temp.find("|"); // When we find the simbol "|" we will identify different element. So we separate the string temp into tokens that will be stored in vector<string> buffer
while (tokenPosition != string::npos)
{
string element;
tokenPosition = temp.find("|");
element = temp.substr(0, tokenPosition);
buffer.push_back(element);
temp.erase(0, tokenPosition+1);
}
ElementSet ss(0,buffer);
buffer.clear();
RowsCols.push_back(ss); // We store all the elements of every row (stores as vector<string> buffer) in a different position in "RowsCols"
}
vector<Table> TablesDescriptor;
Table TablesStorage(RowsCols);
TablesDescriptor.push_back(TablesStorage);
DataBase database(1, TablesDescriptor);
After this, comes the IMPORTANT PART. Let's suppose that I want to make a query, and I ask for input. Let's say that my query is row "n", and also the consecutive tuples "numTuples", and the columns "y". (We must say that the number of columns is defined by a decimal number "y", that will be transformed into binary and will show us the columns to be queried, for example, if I ask for columns 54 (00110110 in binary) I will ask for columns 2, 3, 5 and 6). Then I access to the computer memory to the required information and store it in a vector shownVector. Here I show you the part of this code.
Code that access to the required information upon my input
int n, numTuples;
unsigned long long int y;
clock_t t1, t2;
cout<< "Write the ID of the row you want to get more information: " ;
cin>>n; // We get the row to be represented -> "n"
cout<< "Write the number of followed tuples to be queried: " ;
cin>>numTuples; // We get the number of followed tuples to be queried-> "numTuples"
cout<<"Write the ID of the 'columns' you want to get more information: ";
cin>>y; // We get the "columns" to be represented ' "y"
unsigned int r; // Auxiliar variable for the columns path
int t=0; // Auxiliar variable for the tuples path
int idTable;
vector<int> columnsToBeQueried; // Here we will store the columns to be queried get from the bitset<500> binarynumber, after comparing with a mask
vector<string> shownVector; // Vector to store the final information from the query
bitset<500> mask;
mask=0x1;
t1=clock(); // Start of the query time
bitset<500> binaryNumber = Utilities().getDecToBin(y); // We get the columns -> change number from decimal to binary. Max number of columns: 5000
// We see which columns will be queried
for(r=0;r<binaryNumber.size();r++) //
{
if(binaryNumber.test(r) & mask.test(r)) // if both of them are bit "1"
{
columnsToBeQueried.push_back(r);
}
mask=mask<<1;
}
do
{
for(int z=0;z<columnsToBeQueried.size();z++)
{
int i;
i=columnsToBeQueried.at(z);
vector<int> colTab;
colTab.push_back(1); // Don't really worry about this
//idTable = colTab.at(i); // We identify in which table (with the id) is column_i
// In this simple example we only have one table, so don't worry about this
Table selectedTable = database.getPointer().at(0); // It simmulates a vector with pointers to different tables that compose the database, but our example database only have one table, so don't worry ElementSet selectedElementSet;
ElementSet selectedElementSet;
selectedElementSet=selectedTable.getRowsCols().at(n);
shownVector.push_back(selectedElementSet.getElements().at(i)); // We save in the vector shownVector the element "i" of the row "n"
}
n=n+1;
t++;
}while(t<numTuples);
t2=clock(); // End of the query time
float diff ((float)t2-(float)t1);
float microseconds = diff / CLOCKS_PER_SEC*1000000;
cout<<"The query time is: "<<microseconds<<" microseconds."<<endl;
So my question is... why query time is so different depending on the table size??? (it has nothing to do a table with 100 rows and 100 columns, and a table with 10000 rows and 1000 columns). The thing is that when i access to the information, already saved in the computer memory, i access directly to the element I am looking for and not to all the table... so the size of the table is not taken into account and query time should be the same for every query....
Thank you very much for all your help!!! :)
Class definitions
As some of you asked for, I add the definition of the classes Table and ElementSet:
class ElementSet
{
private:
int id;
vector<string> elements;
public:
ElementSet();
ElementSet(int, vector<string>);
int getId();
void setId(int);
vector<string> getElements();
void setElements(vector<string>);
};
class Table
{
private:
vector<ElementSet> RowsCols;
public:
Table();
Table(vector<ElementSet>);
vector<ElementSet> getRowsCols();
void setRowsCols(vector<ElementSet>);
};
class DataBase
{
private:
int id;
vector<Table> pointer;
public:
DataBase();
DataBase(int, vector<Table>);
int getId();
void setId(int);
vector<Table> getPointer();
void setPointer(vector<Table>);
};
class Utilities
{
public:
Utilities();
static bitset<500> getDecToBin(unsigned long long int);
};
Please read up on database techniques, especially Normal Form and "Indices".
IMHO, your code and concepts are overly complicated.
- Simplifying the design simplifies the code.
- Simplifying the code improves readability, robustness and correctness
Index Tables.
There are two primary concerns in your requirements: Searching for data and fetching the data. In most applications, searching for data will consume more time than fetching the data. So, the primary objective is to make the searching as efficient as possible.
Most databases put the data somewhere and create Index Tables. The Index Table is a data structure that makes finding the data easier (and faster). An example of an Index Table is the std::map
container. For a given key, it will return a value. The trick is to make the value a link, handle or pointer to the data correlated to the key. If your data is small enough, you can simplify this to placing the data into the index table.
Organizing the Data
If your application spends a lot of time searching or fetching data, it is probably Data Oriented, or Data Driven. In either case the data is important, so change the program design around accessing the data.
In database theory, there is discussion of Normal Forms. These are techniques for simplifying (reducing duplications) the data and making the fetching easier.
The Development Plan
I highly suggest you put the data into a simple container, like std::vector
, and start working with the index tables. Once the index tables are implemented, your program will run better.
If the program has poor performance, change the data's structure from std::vector
to something easier to access.
Check out this reply: At what point is it worth using a database?
Your problems start with the fact that you have get/set methods (bad OO design).
This is compounded as you return copies of everything (unlike in Java/C# an object when return/passed to a function is deep copied). To pass a reference you need explicitly inidicate that you are passing a reference rather than copying the object:
class Table
{
// When this returns you are making a copy of the whole vector and returning.
// Thus each access makes a complete new copy of what table is holding.
vector<ElementSet> getRowsCols();
// What you meant was:
vector<ElementSet>& getRowsCols();
// ^^^
// This means return a reference to the object.
// This is what you meant. It is still not a good design
// See below.
// By having a set you are binding your implementation to use a vector of elements
// Try and change this in the future you will have a headache.
void setRowsCols(vector<ElementSet>);
};
class ElementSet
{
int getId();
void setId(int); // An element can change its ID???????
// Same problem as before.
vector<string> getElements();
void setElements(vector<string>);
};
A better design:
Note: This is not supposed to solve your exact problem. Just used to point out some design principles that are similar to yours.
class Row
{
int id;
std::vector<std::string> elements;
// Build a row from a stream.
// It is the responcability of the caller to make sure
// This is a single line (as used by table)
// Note this is why the constructor is in the private section
// So that only people that know this condition actuall call it.
friend std::istream& operator>>(std::istream& str, Table& t);
Row(std::istream& str)
{
str >> id;
str.ignore(std::numeric_limits<std::streamsize>::max(), '|');
std::string element;
while(std::getline(str, element, '|'))
{
elements.push_back(element);
}
}
public:
// Retrieve elements by reference
// Without exposing implementation details.
int getId() const { return id;}
std::string& getElement(int row) { return elements[row];}
std::string const& getElement(int row) const { return elements[row];}
};
class Table
{
std::vector<Row> rows;
// A table knows how to read itself from a stream
// Because of the way streams work it can't be a member but
// we implement it as a friend function.
//
// I have started to put this inside the class to show that
// that two are tightly coupled and as such part of tables public API
friend std::istream& operator>>(std::istream& str, Table& t)
{
std::string line;
while(std::getline(str, line))
{
// For each line we get create a row
// Without exposing how Row is implemented.
std::stringstream linestream(line);
Row row(linestream);
t.rows.push_back(row);
}
return str;
}
public:
// Retrieve individual rows
// Without exposing how rows are actually stored.
// Added const version for completeness.
Row& getRow(int row) { return rows[row];}
Row const& getRow(int row) const { return rows[row];}
};
int main()
{
Table table;
std::ifstream file("data");
file >> table; // load table from file.
// Set table[1][2] to "Plop"
table.getRow(1).getElement(2) = "Plop";
}
One thing that's suspect to me: You're iterating over a bitset<5000>
5,000 times. If you're set on using a bitset, you may want to consider using boost::dynamic_bitset. This way, you could size the masks to the number of columns a table actually has.
If this particular code is executed many times (and I suspect it is), this could be a cheap and easy change that should have a significant impact on performance.
精彩评论