Sorting text file
I've this txt file (almost 60 MiB)
560000100300100201100001000000000000[...]
560000100400100201100001000000000000[...]
560000100400200201100001000000000000[...]
560000100200100201100001000000000000[...]
i'm writing an app in vb .net that do some unrelated process with this file.
But at the end, it's unsorted.
The "keys" are: (they're together)
01003, 01004, 01004, 01002
and
001, 001, 002, 001
Every line starts with 56000 then the first key, the the second key and the rest of the line.
I tried to use SORT, that's included with Windows. It does a pretty nice job, but i开发者_运维技巧 need to have my own function in case SORT is not available.
The output should write 560001002001 at first.
Any ideas?, ask whatever you need yo know.
Thank you.
Don't use the Windows "sort.exe". Use VB.Net instead:
- Read file into a VB.Net string list, a line at a time
- Sort the list
- Write back the sorted file
Here's an example program from MSDN that already does most of the work for you:
Imports System
Imports System.IO
Imports System.Collections
Module Module1
Sub Main()
Dim objReader As New StreamReader("c:\test.txt")
Dim sLine As String = ""
Dim arrText As New ArrayList()
Do
sLine = objReader.ReadLine()
If Not sLine Is Nothing Then
arrText.Add(sLine)
End If
Loop Until sLine Is Nothing
objReader.Close()
For Each sLine In arrText
Console.WriteLine(sLine)
Next
Console.ReadLine()
End Sub
End Module
Here's the documentation for ArrayList.Sort():
http://msdn.microsoft.com/en-us/library/8k6e334t.aspx
'Hope that helps!
wanted to comment, but the browser won't let me. so an answer to the sort on n chars: see sort(icomparer) in http://msdn.microsoft.com/en-us/library/0e743hdt.aspx where you write your own compare function, so anything goes.
Given the size of the file, you may be better going 'old school' and using something like DOS SORT to sort the file. I've had to do this for Data Warehousing and code did not perform as well as a text file sorter.
In a command window (could use a console application, or using ShellExecute on a batch file, or some other way in code), the following command will sort a file according to its contents:
SORT C:\MyFile.CSV /O C:\MyFile_Sorted.CSV
This way, you sort the file as quick as possible, then read the contents of your sorted file (MyFile_Sorted.CSV) into your program. It may be two steps but this is much easier and faster than reading into memory, sorting, then working on the result set. You could read each line in knowing it's already sorted, and remove the need to place 60 MiB in memory.
精彩评论