linq submitchanges runs out of memory
I have a database with about 180,000 records. I'm trying to attach a pdf file to each of those records. Each pdf is about 250 kb in size. However, after about a minute my program starts taking about about a GB of memory and I have to stop it. I tried doing it so the reference to each linq object is removed once it's updated but that doesn't seem to help. How can I make it clear the reference?
Thanks for your help
Private Sub uploadPDFs(ByVal args() As String)
Dim indexFiles = (From indexFile In dataContext.IndexFiles
Where indexFile.PDFContent = Nothing
Order By indexFile.PDFFolder).ToList
Dim currentDirectory As IO.DirectoryInfo
Dim currentFile As IO.FileInfo
Dim tempIndexFile As IndexFile
While indexFiles.Count > 0
tempIndexFile = indexFiles(0)
indexFiles = indexFiles.Skip(1).ToList
currentDirectory = 'I set the directory that I need
currentFile = 'I get the file that I need
writePDF(currentDirectory, currentFile, tempIndexFile)
End While
End Sub
Private Sub writePDF(ByVal directory As IO.DirectoryInfo, ByVal file As IO.FileInfo, ByVal indexFile As IndexFile)
Dim bytes() As Byte
bytes = getFileStream(file)
indexFile.PDFContent = bytes
dataContext.SubmitChanges()
counter += 1
If counter Mod 10 = 0 Then Console.WriteLine(" saved file " & file.Name & " at " & directory.Name)
End Sub
Private Function getFileStream(ByVal fileInfo As IO.FileInfo) As Byte()
Dim fileStream = fileInfo.OpenRead()
Dim bytesLength As Long开发者_StackOverflow社区 = fileStream.Length
Dim bytes(bytesLength) As Byte
fileStream.Read(bytes, 0, bytesLength)
fileStream.Close()
Return bytes
End Function
I suggest you perform this in batches, using Take
(before the call to ToList
) to process a particular number of items at a time. Read (say) 10, set the PDFContent
on all of them, call SubmitChanges
, and then start again. (I'm not sure offhand whether you should start with a new DataContext
at that point, but it might be cleanest to do so.)
As an aside, your code to read the contents of a file is broken in at least a couple of ways - but it would be simpler just to use File.ReadAllBytes
in the first place.
Also, your way of handling the list gradually shrinking is really inefficient - after fetching 180,000 records, you're then building a new list with 179,999 records, then another with 179,998 records etc.
Does the DataContext have ObjectTrackingEnabled set to true (the default value)? If so, then it will try to keep a record of essentially all the data it touches, thus preventing the garbage collector from being able to collect any of it.
If so, you should be able to fix the situation by periodically disposing the DataContext and creating a new one, or turning object tracking off.
OK. To use the smallest amount of memory we have to update the datacontext in blocks. I've put a sample code below. Might have sytax errors since I'm using notepad to type it in.
Dim DB as YourDataContext = new YourDataContext
Dim BlockSize as integer = 25
Dim AllItems = DB.Items.Where(function(i) i.PDFfile.HasValue=False)
Dim count = 0
Dim tmpDB as YourDataContext = new YourDataContext
While (count < AllITems.Count)
Dim _item = tmpDB.Items.Single(function(i) i.recordID=AllItems.Item(count).recordID)
_item.PDF = GetPDF()
Count +=1
if count mod BlockSize = 0 or count = AllItems.Count then
tmpDB.SubmitChanges()
tmpDB = new YourDataContext
GC.Collect()
end if
End While
To Further optimise the speed you can get the recordID's into an array from allitems as an anonymous type, and set DelayLoading on for that PDF field.
精彩评论