MySQL/PHP Eliminating duplicate, non-identical, returns from a MySQL self-join
I have a small database, holding the details of just under 400 ponies. I wish to query that table and return a table showing the pertinant details of each pony, and it's owner's and breeder's names. The data is held primarily like so:
- profiles - a table holding all info assigned to each individual pony, including it's sire's and dam's reg numbers, and it's owner's and breeder's DB assigned id's.
- contacts - a table for the people's info. Joined as 'owner' and again as 'breeder' in the query below.
- prm_* - multiple parameter tables, holding broad details such as colour, breed, etc.
Where I am running into trouble is when trying my first self join: querying the profiles table three times in order to retrieve the names of the sire and dam for each profile, as well as the pony's own name to begin with. When I run the query, it returns duplicate rows for many (not all) profiles. Using DISTINCT eliminated most of these, but the issue remains with the non-identical results, particularly for those ponies where no sire or dam is on record.
I have googled the problem, and it does appear here and there, but I cant quite grasp what happening in the solutions given. I'm not even certain why the problem occurs at all. Can someone please step me through the issue and the solving of it? I'd be most grateful.
My query as it stands (returns 408 results, from only 387 ponies!):
include 'conn.php';
?>
<table class="admin-display">
<thead><tr><th>No:</th><th>Name:</th><th>Sire:</th><th>Dam:</th><th>Age:</th><th>Colour:</th><th>Gender:</th><th>Owner:</th><th>Breeder:</th></tr></thead>
<?php
$i=1;
$sql = mysql_query("SELECT DISTINCT p.ProfileID, p.ProfileOwnerID, p.ProfileBreederID, p.ProfilePrefix, p.ProfileSireReg, p.ProfileDamReg,
p.ProfileGenderID, p.ProfileAdultColourID, p.ProfileColourModifierID, p.ProfileYearOfBirth,
p.ProfileYearOfDeath, p.ProfileLocalRegNumber, p.ProfileName,
sire.ProfileName AS sireName, sire.ProfilePrefix AS sirePrefix,
dam.ProfileName AS damName, dam.ProfilePrefix AS damPrefix,
owner.ContactFirstName AS owner_fname, owner.ContactLastName AS owner_lname,
breeder.ContactFirstName AS breeder_fname, breeder.ContactLastName AS breeder_lname,
BreedGender, BreedColour, BreedColourModifier
开发者_Python百科 FROM profiles AS p
LEFT JOIN profiles AS sire
ON p.ProfileSireReg = sire.ProfileLocalRegNumber
LEFT JOIN profiles AS dam
ON p.ProfileDamReg = dam.ProfileLocalRegNumber
LEFT JOIN contacts AS owner
ON p.ProfileOwnerID = owner.ContactID
LEFT JOIN contacts AS breeder
ON p.ProfileBreederID = breeder.ContactID
LEFT JOIN prm_breedgender
ON p.ProfileGenderID = prm_breedgender.BreedGenderID
LEFT JOIN prm_breedcolour
ON p.ProfileAdultColourID = prm_breedcolour.BreedColourID
LEFT JOIN prm_breedcolourmodifier
ON p.ProfileColourModifierID = prm_breedcolourmodifier.BreedColourModifierID
WHERE p.ProfileName != 'Unknown'
ORDER BY p.ProfileID ASC");
while($row = mysql_fetch_array($sql)) {
$id = $row['ProfileID'];
$name = $row['ProfilePrefix'] . ' ' . $row['ProfileName'];
if ($row['ProfileYearOfDeath'] > 0000) { $age = ($row['ProfileYearOfDeath'] - $row['ProfileYearOfBirth']); }
elseif ($row['ProfileYearOfDeath'] <= 0000) { $age = (date('Y') - $row['ProfileYearOfBirth']); }
$reg = $row['ProfileLocalRegNumber'];
$sire = $row['sirePrefix'] . ' ' . $row['sireName'];
$dam = $row['damPrefix'] . ' ' . $row['damName'];
$colour = $row['BreedColour'];
$gender = $row['BreedGender'];
$owner = $row['owner_fname'] . ' ' . $row['owner_lname'];
$breeder = $row['breeder_fname'] . ' ' . $row['breeder_lname'];
echo '<tr><td>' . $i++ . '</td><td>' . $name . '</td><td>' . $sire . '</td>';
echo '<td>' . $dam . '</td><td>' . $age . '</td><td>' . $colour . '</td><td>' . $gender. '</td>';
echo '<td>' . $owner . '</td><td>' . $breeder. '</td></tr>';
}
echo '</table>';
mysql_close($con);
Use GROUP BY
over DISTINCT
:
http://msmvps.com/blogs/robfarley/archive/2007/03/24/group-by-v-distinct-group-by-wins.aspx
The problem is going to be in the data - one of the tables that you're joining against has multiple rows on associated to the join key.
I recommend executing the query in stages. Start with the base query (taking out the field list):
SELECT count(*)
FROM profiles AS p
WHERE p.ProfileName != 'Unknown'
And then add the join tables in one at a time until you see the count increase...
SELECT count(*)
FROM profiles AS p
LEFT JOIN profiles AS sire
ON p.ProfileSireReg = sire.ProfileLocalRegNumber
WHERE p.ProfileName != 'Unknown'
You should then be able to see where the duplicate is. If you want to easily see which record is duplicated, you can run this query:
SELECT p.Profile_id, count(*) cnt
FROM profiles AS p
LEFT JOIN profiles AS sire
ON p.ProfileSireReg = sire.ProfileLocalRegNumber
-- (all other joins)
WHERE p.ProfileName != 'Unknown'
GROUP BY p.Profile_id
HAVING count(*) > 1
Then you can look at the details of the duplicated records.
精彩评论