开发者

Building an import process that checks for duplicates

Using ASP.NET, I'm building an admin tool that requires a function to import a list of email addresses. Upon uploading the file, I want to check for existing records for any of the email addresses supplied. For non-existing email addresses, I would create them using my DAO.

Basically I want to:

  1. Receive list of emails
  2. Retrieve data for existing emails
  3. Create data for new emails in db
  4. Return full data for all emails in list.

Since I want to know which of the emails exist up front, my first thought was to query the table for all records WHERE Email IN ('Email001FromFile', 'Email002FromFile', 'etc...') but the list could potenti开发者_运维问答ally contain thousands of email addresses, and I'm not certain supplying that many email addresses to the IN operator would be a good idea.

I also thought about looping through the list and checking for a record for each email, but that would potentially generate far too many queries.

My next thought was to generate a temp table to hold the list and modify the IN clause to use the temp table, rather than an explicit list of items, but that would require I execute SQL or a stored procedure directly, which I'm not inclined to do since I'm using NHibernate to access my DB.

Though I am using ASP.NET (C#) and NHibernate, and any answers specific to that would be helpful, I'm really just looking for general ideas on how to handle this scenario.


If loading the existing e-mails into memory is not an option I would maybe go for some kind of batch approach. Go for the IN-query you mention, but do it only for n emails at time. You could eiter hardcode n to a certain value or you could let it be a function of the total number of new e-mails.

I'm not sure whether this approach really is faster than to perform one single IN-query (someone with more db-skills than me would have to answer that), but that would allow you to indicate some kind of loading status to the user.


Are you doing anything with the emails that are duplicates?

You could put a UNIQUE constraint on your table to only allow an email address to be entered once - then catch the exception SQL will throw when you attempt to insert a duplicate.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜