MySQL Normalization stored procedure performance
I've written a stored procedure in MySQL to take values currently in a table and to "Normalize" them. This means that for each value passed to the stored procedure, it checks whether the value is already in the table. If it is, then it stores the id o开发者_开发知识库f that row in a variable. If the value is not in the table, it stores the newly inserted value's id. The stored procedure then takes the id's and inserts them into a table which is equivalent to the original de-normailized table, but this table is fully normalized and consists of mainly foreign keys.
My problem with this design is that the stored procedure takes approximately 10ms or so to return, which is too long when you're trying to work through some 10million records. My suspicion is that the performance is to do with the way in which I'm doing the inserts. i.e.
INSERT INTO TableA
(first_value)
VALUES
(argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);
SET @TableAId = LAST_INSERT_ID();
The "ON DUPLICATE KEY UPDATE" is a bit of a hack, due to the fact that on a duplicate key I don't want to update anything but rather just return the id value of the row. If you miss this step though, the LAST_INSERT_ID() function returns the wrong value when you're trying to run the "SET ..." statement.
Does anyone know of a better way to do this in MySQL?
I've gone back and created a function to handle this case instead:
CREATE DEFINER=`root`@`%` FUNCTION `value_update`(inValue VARCHAR(255)) RETURNS int(11)
BEGIN
DECLARE outId INT;
SELECT valueId INTO outId FROM ValuesTable WHERE value = inValue;
IF outId IS NULL THEN
INSERT INTO ValuesTable (value) VALUES (inValue);
SELECT LAST_INSERT_ID() INTO outId;
END IF;
RETURN outId;
END
The stored procedure mentioned earlier calls these functions instead of doing the INSERT statements itself. Performance-wise, the above function is faster in my setup (using ndb table type). Also, after benchmarking all of the parts of my application I've found that the performance issues this was causing were only a minor part of the overall performance bottleneck.
If you already have a unique identifier, is there any need to have an auto-incrementing primary key?
精彩评论