Debugging livelock in Django/Postgresql

2022-12-15 10:05 问答作者：

I run a moderately popular web app on Django with Apache2, mod_python, and PostgreSQL 8.3 with the postgresql_psycopg2 database backend. I'm experiencing occasional livelock, identifiable when an apache2 process continually consumes 99% of CPU for several minutes or more.

I did an strace -ppid on the apache2 process, and found that it was continually repeating these system calls:

sendto(25, "Q\0\0\0SSELECT (1) AS \"a\" FROM \"account_profile\" WHERE \"account_profile\".\"id\" = 66201 \0", 84, 0, NULL, 0) = 84
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
poll([{fd=25, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recvfrom(25, "E\0\0\0\210SERROR\0C25P02\0Mcurrent transaction is aborted, commands ignored until end of transaction block\0Fpostgres.c\0L906\开发者_开发知识库0Rexec_simple_query\0\0Z\0\0\0\5E", 16384, 0, NULL, NULL) = 143

This exact fragment repeats continually in the trace, and was running for over 10 minutes before I finally killed the apache2 process. (Note: I edited this to replace my previous strace fragment with a new one that shows full the full string contents rather than truncated.)

My interpretation of the above is that django is attempting to do an existence check on my table account_profile, but at some earlier point (before I started the trace) something went wrong (SQL parse error? referential integrity or uniqueness constraint violation? who knows?), and now Postgresql is returning the error "current transaction is aborted". For some reason, instead of raising an Exception and giving up, it just keeps retrying.

One possibility is that this is being triggered in a call to Profile.objects.get_or_create. This is the model class that maps to the account_profile table. Perhaps there is something in get_or_create that is designed to catch too broad a set of exceptions and retry? From the web server logs, it appears that this livelock might have occurred as a result of a double-click on the POST button in my site's registration form.

This condition has occurred a couple of times over the past few days on the live site, and results in a significant slowdown until I intervene, so pretty much anything other than infinite deadlock would be an improvement! :)

This turned out to be entirely my fault. I found the spot where the select (1) as 'a' statement seemed to originate (in django/models/base.py) and hacked it to log a traceback, which pointed clearly at my code.

I had some code that makes up a unique email "key" for each Profile. These keys are randomly generated, so because there is some possibility of overlap, I run it in a try/except within a while loop. My assumption was that the database's unique constraint would cause the save to fail if the key was not unique, and I'd be able to try again.

Unfortunately, in Postgresql you cannot simply try again after an integrity error. You have to issue a COMMIT or ROLLBACK command (even if you're in autocommit mode, apparently) before you can try again. So I had an infinite loop of failing save attempts where I was ignoring the error message.

Now I look for a more specific exception (django.db.IntegrityError) and run a limited number of attempts so that the loop is not infinite.

Thanks to everyone for viewing/answering.

Your analysis sounds pretty good. Clearly it's not picking up the fact that the transaction is aborted. I suggest you report this as a bug to the django project...

继续阅读：django livelock postgresql psycopg2 transactions

Debugging livelock in Django/Postgresql

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？