guessing tags depends on content in php
i need to write a script which gue开发者_开发百科sses tags depends on text content
consider we have this sentence as our story :
stack overflow internet services, inc; user contributions licensed under cc-wiki with attribution required
now we have some tags in our database table such as
Internet , License , service
now we need to write a script to guess what tags are good for the above content , it means there is no need to type tags when you write a story , just let the script guess tags
ok here we go with php :
$content = " stack overflow internet services, user contributions licensed under cc-wiki with attribution required and internet is a good service " ;
$result = $db->sql_query("SELECT tag FROM table_tags");
while ($row= $db->sql_fetchrow($result)) {
$tag_title = $row[tag];
$words = explode(" ", $content ); //break the sentence to words with space
for ($i=0;$i<sizeof($words); $i++){
if ($words[$i] == $tag ) {
$outcome .="$words[$i]-";
}
}
}
echo $outcome ;
ok problem :
it repeats tags and outcome would be this :
internet - internet
How about turning it on its head a little..
Why not farm more out to the SQL statement itself? The initial loop to construct the statement will likely have to run fewer times than looping through each returned row if you simply execute an open ended statement, so will thus be faster..
$content = " stack overflow internet services, user contributions licensed under cc-wiki with attribution required and internet is a good service " ;
$words = explode(" ", trim($content) ); //break the sentence to words with space
$sql="SELECT `tag` FROM table_tags WHERE ";
for ($i=0;$i<sizeof($words); $i++){
$sql. = " `tag` ='". mysql_real_escape_string($words[$i])."'";
if($i!=sizeof($words)-1){
$sql.=" OR ";
}
}
$result = $db->sql_query($sql);
// returned rows will now ONLY be matching tags
while ($row= $db->sql_fetchrow($result)) {
$tag_title = $row[tag];
}
print_r($tag_title);
So, if you had a recordset of 1000 rows (tags in your DB), and only 4 potential tags (words in your title), if you loop through the rows in PHP using the proposals above- the loop has to run 1000 times to simply identify 4 possible matches...if you move the criteria/identification to the SQL, the loop only has to run 4 times in order to build the initial filter, which is far more efficient. What it will also do is automatically prevent against duplicats - though if they exist in your DB, simply append 'GROUP BY tag'
to $sql
..
nb. As per the comment below- IN can be used instead of OR:
$sql="SELECT `tag` FROM table_tags WHERE `tag` IN (";
for ($i=0;$i<sizeof($words); $i++){
$sql. = "'". mysql_real_escape_string($words[$i])."'";
if($i!=sizeof($words)-1){
$sql.=", ";
}
}
$result = $db->sql_query($sql.")");
Try this:
$content = " stack overflow internet services, user contributions licensed under cc-wiki with attribution required and internet is a good service " ;
$result = $db->sql_query("SELECT tag FROM table_tags");
while ($row= $db->sql_fetchrow($result)) {
$tag_title = $row[tag];
$words = explode(" ", $content ); //break the sentence to words with space
for ($i=0;$i<sizeof($words); $i++){
if ($words[$i] == $tag ) {
//$outcome .="$words[$i]-";
$found_tags[$words[$i]] = $words[$i];
}
}
}
$outcome = implode(' - ', $found_tags);
echo $outcome ;
free terms extractor
Could you add all of your words into an array, but then check whether your individual word exists in the array before adding it?
if ($words[$i] == $tag )
{
if (!in_array($outcome, $words[$i]))
{
$outcome[] = $words[$i];
}
}
Put the words in an array first then just loop through your tags. This will prevent duplicates and speed your process up significantly:
$words = explode(" ", $content);
while ($row= $db->sql_fetchrow($result)) {
$tag_title = $row[tag];
if( in_array( $tag_title, $words ) ) {
$found_tags[] = $tag_title;
}
}
Note you don't need an index. Using []
will cause PHP to use the next index for your array.
精彩评论