开发者

mysql performace issue with code and table design

I need some options.

I have a table layed out as follows with about 78,000,000 rows...

  • id INT (Primary Key)
  • loc VARCHAR (Indexed)
  • date VARCHAR (Indexed)
  • time VARCHAR
  • ip VARCHAR
  • lookup VARCHAR

Here is an example of a query I have.

SELE开发者_如何学编程CT lookup, date, time, count(lookup) as count FROM dnstable
WHERE STR_TO_DATE(`date`, '%d-%b-%Y') >= '$date1' AND STR_TO_DATE(`date`, '%d-%b-%Y')   <= '$date2' AND
time >= '$hour1%' AND time <= '$hour2%' AND
`loc` LIKE '%$prov%' AND
lookup REGEXP 'ca|com|org|net' AND
lookup NOT LIKE '%.arpa' AND
lookup NOT LIKE '%domain.ca' AND 
ip NOT LIKE '192.168.2.1' AND
ip NOT LIKE '192.168.2.2' AND
ip NOT LIKE '192.168.2.3'
GROUP BY lookup
ORDER BY count DESC
LIMIT 100

I have my mysql server configured like a few high useage examples I found. The hardware is good, 4 cores, 8 gig rams.

This query takes about 180 seconds... Does anyone have some tips on making this more efficent?


There are a lot of things wrong here. A LOT of things. I would look to the other answers for query options (you use a lot of LIKES, NOT LIKES, and functions....and you're doing them on unkeyed columns...). If I were in your case, I'd redesign my entire database. It looks as though you're using this to store DNS entries - host names to IP addresses.

You may not have the option to redesign your database - maybe it's a customer database or something that you don't have control over. Maybe they have a lot of applications which depend on the current database design. However, if you can refactor your database, I would strongly suggest it.

Here's a basic rundown of what I'd do:

  1. Store the TLDs (top-level-domains) in a separate column as an ENUM. Make it an index, so it's easily searchable, instead of trying to regex .com, .arpa, etc. TLDs are limited anyway, and they don't change often, so this is a great candidate for an ENUM.

  2. Store the domain without the TLD in a regular column and a reversed column. You could index both columns, but depending on your searches, you may only need to index the reverse column. Basically, having a reverse column allows you to search for all hosts in one domain (ex. google) without having to do a fulltext search each time. MySQL can do a key search on the string "elgoog" in the reverse column. Because DNS is a hierarchy, this fits perfectly.

  3. Change the date and time columns from VARCHAR to DATE and TIME, respectively. This one's an obvious change. No more str_to_time, str_to_date, etc. Absolutely no point in doing that.

  4. Store the IP addresses differently. There's no reason to use a VARCHAR here - it's inefficient and doesn't make sense. Instead, use four separate columns for each octet (this is safe because all IPv4 addresses have four octets, no more, no less) as unsigned TINYINT values. This will give you 0-255, the range you need. (Each IP octet is actually 8 bits, anyway.) This should make searches much faster, especially if you key the columns.

    ex: select * from table where octet1 != 10; (this would filter out all 10.0.0.0/8 private IP space)

The basic problem here is that your database design is flawed - and your query is using columns that aren't indexed, and your queries are inefficient.

If you're stuck with the current design....I'm not sure if I can really help you. I'm sorry.


I bet the really big problem here are the STR_TO_DATE functions. If possible then try date column to really have a DATE datatype. (DATE, DATETIME, TIMESTAMP)

Having this new or altered column (with date datatype) indexed would speed up the selection over that column significant. You have to avoid the date parsing which currently lacked by wrong datatype for column 'date'. This parsing/converting avoids MySQL from using the index on the 'date' column.

Conclusion: Make 'date' column having a Date datatype, have this column indexed and do not use STR_TO_DATE in your statement.


I insinuate that these local ip addresses are not very selective when used with negation, right? (This depends on the typical data in the table.) Since ip-column is not indexed, selections on that column always result in full table scan. When unequal (<>) selection on ip is very selective then consider putting an index on it and change statement to not use 'like' but <> instead. But I do not think that unequal-selection on ip is very selective.

Conclusion: I do not think you can win anything significant here.


The problem is that a LIKE will mean full table traversal! Which is why you are seeing this. First thing I would suggest is get rid of LIKE '192.168.2.1' since really that is the same as ='192.168.2.1' Also the fact that you set the LIMIT 100 at the end means that the query will run against all the records then select only the first 100 -- how about instead you do a SELECT which only involves all the other operations but not LIKE and limit this one and then have a second SELECT which uses LIKE?


Some Tips

  • Use != instead of NOT LIKE
  • Avoid REGEXP in mysql query
  • Avoid STR_TO_DATE(date, '%d-%b-%Y') >= '$date1' try passing the MySQL formatted date to query rather converting them with STR_TO_DATE
  • lookup should be indexed if you have to use group by on it.

Try caching the query results(if possible).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜