How can sort timestamps in dd:mm:yyyy hh24:mi:ss format in descending order in Perl?
I have to sort my hash keys which is a timestamp (dd:mm:yyyy hh24:mi:ss)
in descending order.
sort { $b <=> $a } keys %time_spercent
this way is not getting me what I intend to do. Rather this ends in sorting with the higher hours and minutes first even though the date is not so. For example, this is how I get when I do the sorting as I have mentioned.
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
Rather I want them in this order arranged both by date as well as in time.
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:48:37
21:01:2011 16:51:09
21:01:2011 16:49:54
Any pointers are suggestion on how this could be done would be gratefully received.
Update
foreach my $status_date(
map { $_->[0] }
sort { $b->[1] cmp $a->[1] }
map { [$_, sorting_desc($_)] } keys % {$com_sam->{ $s1 } } )
and
sub sorting_desc {
$_ = shift;
if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)/) {
return "$2:$1:$3:$4:$5:$6";
}
}
is the subroutine for sorting.
I also tried
foreach my $status_date(
map { $_->[0] }
sort { $b->[1] cmp $a->[1] }
map { [$_, (split/[:\s][1]] } keys % {$com_sam->{ $s1 } } )
but not the expected results.
All I get is:
WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Unknown(Unknown) 192654 01:07:2011 16:13:55
WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Unknown(Unknown) 192655 01:07:2011 16:11:23
WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Male(Unknown) 192656 01:07:2011 11:04:26
WGA_PD6355b WGA_PD6355b 96(1) 96(1) 96 100.00 388 Unknown(Unknown) 184558 04:05:2011 17:35:52
WGA_PD6355b WGA_PD6355a 96(1) 66(31) 66 95.45 388 Unknown(Unknown) 184558 04:05:2011 17:35:52
WGA_PD6355b WGA_PD6355b 96(1) 96(1) 96 100.00 388 Unknown(Unknown) 184557 04:05:2011 17:34:27
WGA_PD6355b WGA_PD6355a 96(1) 66(31) 66 95.45 388 Unknown(Unknown) 184557 04:05:2011 17:34:27
3074 3074 87(10) 87(10) 87 100.00 109 Unknown(Unknown) 174878 15:02:2011 09:24:31
3074 3074 87(10) 87(10) 87 100.00 109 Unknown(Unknown) 174970 15:02:2011 09:21:19
3074 3074 87(10) 87(10) 87 100.00 109 Female(Unknown) 174860 15:02:2011 09:16:32
3163 3163 90(7) 90(7) 90 100.00 176 Unknown(Unknown) 173382 09:02:2011 09:54:48
3163 3163 90(7) 90(7) 90 100.00 176 Unknown(Unknown) 173284 09:02:2011 09:51:02
CHP-212 CHP-212 94(3) 94(3) 94 100.00 269 Unknown(Unknown) 173382 09:02:2011 09:54:48
CHP-212 CHP-212 94(3) 94(3) 94 100.00 269 Unknown(Unknown) 173284 09:02:2011 09:51:02
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Male(Unknown) 200943 01:09:2011 10:48:18
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Unknown(Unknown) 200944 25:08:2011 10:20:16
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Unknown(Unknown) 200945 25:08:2011 10:19:05
MGH_2631 MGH_2631 90(8) 90(8) 90 开发者_StackOverflow社区 100.00 211 Male(Unknown) 200946 25:08:2011 10:17:26
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Male(Unknown) 200943 01:09:2011 10:48:18
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Unknown(Unknown) 200944 25:08:2011 10:20:16
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Unknown(Unknown) 200945 25:08:2011 10:19:05
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Male(Unknown) 200946 25:08:2011 10:17:26
PD4294c PD4294c 95(2) 95(2) 95 100.00 221 Unknown(Unknown) 179502 23:03:2011 10:03:23
PD4294c PD4294c 95(2) 95(2) 95 100.00 221 Unknown(Unknown) 179470 23:03:2011 10:02:30
Can you change your format to yyyy:mm:dd hh24:mi:ss
? At that point you'd have a natural ordering. Basically it's a lot more machine-friendly to have everything in decreasing order of importance :)
EDIT: Then just order using string comparisons, as it will naturally sort the right way.
From your question it is unclear to me how you really want to sort and how you produced the examples. I cannot detect any order in the example of your expected sort order. A likely solution is at the bottom.
Let me clarify:
Given a textfile "ts" with the following content (your example):
> cat ts
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
A standard sort produces the following output:
> perl -e '@a = <>; print sort @a' ts
05:04:2011 11:48:37
05:04:2011 11:48:37
05:04:2011 11:51:13
05:04:2011 11:51:13
21:01:2011 16:49:54
21:01:2011 16:51:09
26:01:2011 11:01:40
26:01:2011 11:02:55
While the numerically descending sort you proposed produces the following order:
> perl -e '@a = <>; print sort { $b <=> $a } @a' ts
26:01:2011 11:02:55
26:01:2011 11:01:40
21:01:2011 16:51:09
21:01:2011 16:49:54
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
To clarify on the numerical sort: The spaceship operator <=> enforces numerical interpretation of its two operands. So the strings $a and $b, each containing the date and time, are interpreted as if they were numbers. To do this perl in this example extracts the date and stops at the first ':'. That's why the time, and even the month and year are completely ignored and we're only sorting for the day of the month in descending order.
Finally, if you really want to reverse sort for date, then time and need to keep the format you can use this code:
> perl -e '@a = <>; sub dmyt2ymdt { my $dmyt=shift; $ymdt=join(q(), (split(/[:\s]+/,$dmyt))[2,1,0,3,4,5])} print sort { dmyt2ymdt($b) <=> dmyt2ymdt($a) } @a' ts
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
26:01:2011 11:02:55
26:01:2011 11:01:40
21:01:2011 16:51:09
21:01:2011 16:49:54
Here's a nicer formatted version (which I did not test):
sub dmyt2ymdt {
my $dmyt = shift;
my ($day, $mon, $year, $h, $m, $s) = split(/[:\s]+/, $dmyt);
return join('', $year, $mon, $day, $h, $m, $s);
}
This sort function
sort { dmyt2ymdt($b) <=> dmyt2ymdt($a) }
then calls the above helper quite a lot. In your example we have 8 entries in the list to sort and the function gets called 24 times. So it is not performance efficient. But for small lists up to a couple hundred or even thousand entries it may be alright for you. If you have large lists, you should do the format conversion only once, but it still costs memory. So for large lists, you need to tradeoff memory versus execution time, as is often the case.
IF performance is the optimization criteria, you could do the transformation on the fly as has been commented and shown in other answers and comments like this:
sort { $b <=> $a } map { dmyt2ymdt($_) } @a
..for my example above. Now you do the conversion only once per element. Still, we have to hold a temporary list in memory. I'm not exactly sure how well perl could optimize the above construct. One may think that the following is easier to optimize:
reverse sort map { dmyt2ymdt($_) } @a
which would work for the testset, too. The sort defaults back to the string comparison which is the same as a numerical comparison for strings of identical length which do not use spaces in those locations where other strings have digits.
Hope this helps!
Jon Skeet's answer is better! (i.e., just change your time stamp, if you can, to the ISO 8601 format.)
But if you can't change the format, you could do something like:
#!/usr/bin/perl -w
use strict;
my %h;
while(<DATA>) {
chomp;
$h{$_}++;
}
sub iso_8601 {
$_ = shift;
if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)/) {
return "$3:$2:$1:$4:$5:$6";
}
}
foreach my $key (sort {iso_8601($a) cmp iso_8601($b)} keys %h) {
print "$key -- $h{$key}\n";
}
__DATA__
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
(The duplicate time stamps I assume you have your own logic to deal with. By hashing them, the duplicates are counted, and I am just printing their count...)
Result:
21:01:2011 16:49:54 -- 1
21:01:2011 16:51:09 -- 1
26:01:2011 11:01:40 -- 1
26:01:2011 11:02:55 -- 1
05:04:2011 11:48:37 -- 2
05:04:2011 11:51:13 -- 2
Edit
OK, if you are concerned about efficiency, the (sort {iso_8601($a) cmp iso_8601($b)} keys %h)
is not the best since the iso_8601() function is called many times per hash element.
For a form of "Schwartzian Transform" you can do:
print join("\n",
map { $_->[0].' -- '.$h{$_->[0]} }
sort { $a->[1] cmp $b->[1] }
map {[$_,iso_8601($_)]}
keys %h);
Which will produce the same output as above. Then you are calling iso_8601()
only once per hash key, not multiple times...
To dissect that (it goes right to left, bottom to top):
keys %h # list of all the keys of the hash
map {[$_,iso_8601($_)]} # create anon array with 2 elements:
# original stamp and ISO 8601 stamp
sort { $a->[1] cmp $b->[1] } # list sorted on the ISO 8601 stamp
map { $_->[0].' -- '.$h{$_->[0]} } # a list of strings with original stamp
# and hash count
join("\n", # join the list into a string with a "\n"
EDIT 2
I am having a hard time understanding what you want. Try this:
#!/usr/bin/perl -w
use strict;
my %h;
my $i=0;
while(<DATA>) {
chomp;
$h{$_}++;
}
sub iso_8601 {
$_ = shift;
if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)$/) {
$i++;
return "$3-$2-$1 $4:$5:$6";
}
}
foreach my $key (sort {iso_8601($b) cmp iso_8601($a)} keys %h) {
print iso_8601($key).":\t\t"."$key -- $h{$key}\n";
}
print "\n";
Output:
YYYY-MM-DD HH:MM:SS your record...
2011-09-01 10:48:18: MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Male(Unknown) 200943 01:09:2011 10:48:18 -- 1
2011-09-01 10:48:18: MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Male(Unknown) 200943 01:09:2011 10:48:18 -- 1
2011-08-25 10:20:16: MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Unknown(Unknown) 200944 25:08:2011 10:20:16 -- 1
2011-08-25 10:20:16: MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Unknown(Unknown) 200944 25:08:2011 10:20:16 -- 1
2011-08-25 10:19:05: MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Unknown(Unknown) 200945 25:08:2011 10:19:05 -- 1
2011-08-25 10:19:05: MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Unknown(Unknown) 200945 25:08:2011 10:19:05 -- 1
2011-08-25 10:17:26: MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Male(Unknown) 200946 25:08:2011 10:17:26 -- 1
2011-08-25 10:17:26: MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Male(Unknown) 200946 25:08:2011 10:17:26 -- 1
2011-07-01 16:13:55: WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Unknown(Unknown) 192654 01:07:2011 16:13:55 -- 1
2011-07-01 16:11:23: WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Unknown(Unknown) 192655 01:07:2011 16:11:23 -- 1
2011-07-01 11:04:26: WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Male(Unknown) 192656 01:07:2011 11:04:26 -- 1
2011-05-04 17:35:52: WGA_PD6355b WGA_PD6355b 96(1) 96(1) 96 100.00 388 Unknown(Unknown) 184558 04:05:2011 17:35:52 -- 1
2011-05-04 17:35:52: WGA_PD6355b WGA_PD6355a 96(1) 66(31) 66 95.45 388 Unknown(Unknown) 184558 04:05:2011 17:35:52 -- 1
2011-05-04 17:34:27: WGA_PD6355b WGA_PD6355b 96(1) 96(1) 96 100.00 388 Unknown(Unknown) 184557 04:05:2011 17:34:27 -- 1
2011-05-04 17:34:27: WGA_PD6355b WGA_PD6355a 96(1) 66(31) 66 95.45 388 Unknown(Unknown) 184557 04:05:2011 17:34:27 -- 1
2011-03-23 10:03:23: PD4294c PD4294c 95(2) 95(2) 95 100.00 221 Unknown(Unknown) 179502 23:03:2011 10:03:23 -- 1
2011-03-23 10:02:30: PD4294c PD4294c 95(2) 95(2) 95 100.00 221 Unknown(Unknown) 179470 23:03:2011 10:02:30 -- 1
2011-02-15 09:24:31: 3074 3074 87(10) 87(10) 87 100.00 109 Unknown(Unknown) 174878 15:02:2011 09:24:31 -- 1
2011-02-15 09:21:19: 3074 3074 87(10) 87(10) 87 100.00 109 Unknown(Unknown) 174970 15:02:2011 09:21:19 -- 1
2011-02-15 09:16:32: 3074 3074 87(10) 87(10) 87 100.00 109 Female(Unknown) 174860 15:02:2011 09:16:32 -- 1
2011-02-09 09:54:48: CHP-212 CHP-212 94(3) 94(3) 94 100.00 269 Unknown(Unknown) 173382 09:02:2011 09:54:48 -- 1
2011-02-09 09:54:48: 3163 3163 90(7) 90(7) 90 100.00 176 Unknown(Unknown) 173382 09:02:2011 09:54:48 -- 1
2011-02-09 09:51:02: 3163 3163 90(7) 90(7) 90 100.00 176 Unknown(Unknown) 173284 09:02:2011 09:51:02 -- 1
2011-02-09 09:51:02: CHP-212 CHP-212 94(3) 94(3) 94 100.00 269 Unknown(Unknown) 173284 09:02:2011 09:51:02 -- 1
Is this what you are thinking? It parses the time stamp at the end of line and sorts those records in descending order. What is the issue with this?
I had the same problem some time ago, and I solved transforming the format when I sorted the list in the same way as Jon Skeet has propossed, this is my piece of code:
my @source = <DATA>;
my @data = sort {$a<=>$b} map { m!(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)!; "$3$2$1$4$5$6";} @source;
foreach ( @data ) {
s!(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})!$3:$2:$1 $4:$5:$6!;
print $_, "\n";
}
__DATA__
05:04:2011 11:48:37
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
21:01:2011 16:51:09
15:04:2012 11:48:37
The result is:
21:01:2011 16:49:54
21:01:2011 16:51:09
26:01:2011 11:01:40
26:01:2011 11:02:55
05:04:2011 11:48:37
05:04:2011 11:48:37
05:04:2011 11:51:13
05:04:2011 11:51:13
15:04:2012 11:48:37
First, understand what you are trying to do. Next, get it to work. Then, if necessary, optimize.
One way to easily compare the time stamps is to convert them to offsets from an epoch. You can use Time::Local. Given that you are not getting arbitrary values, but rather well defined timestamps, you could engage in a little premature optimization and use the _nocheck
version of timelocal
or timegm
.
Here is one way to do it using the sample data you provided:
#!/usr/bin/env perl
use strict; use warnings;
use Time::Local 'timelocal';
my @data;
while (my $line = <DATA>) {
last unless $line =~ /\S/;
chomp $line;
push @data, [ split ' ', $line ];
}
@data = sort compare_records_descending_time @data;
print join("\t", @$_), "\n" for @data;
sub compare_records_descending_time {
return ts2time($b) <=> ts2time($a);
}
sub ts2time {
my ($record) = @_;
my $ts = "@{ $record }[-2, -1]";
# timestamp is day:mon:year hr:min:sec
# timelocal expects arguments in sec, min, hr, day, mon, year
return timelocal(($ts =~ /([0-9]+)/g)[5, 4, 3, 0, 1, 2]);
}
__DATA__
124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Unknown(Unknown) 192654 01:07:2011 16:13:55
WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Unknown(Unknown) 192655 01:07:2011 16:11:23
WGA_PD7124a WGA_PD7124a 95(2) 95(2) 95 100.00 193 Male(Unknown) 192656 01:07:2011 11:04:26
WGA_PD6355b WGA_PD6355b 96(1) 96(1) 96 100.00 388 Unknown(Unknown) 184558 04:05:2011 17:35:52
WGA_PD6355b WGA_PD6355a 96(1) 66(31) 66 95.45 388 Unknown(Unknown) 184558 04:05:2011 17:35:52
WGA_PD6355b WGA_PD6355b 96(1) 96(1) 96 100.00 388 Unknown(Unknown) 184557 04:05:2011 17:34:27
WGA_PD6355b WGA_PD6355a 96(1) 66(31) 66 95.45 388 Unknown(Unknown) 184557 04:05:2011 17:34:27
3074 3074 87(10) 87(10) 87 100.00 109 Unknown(Unknown) 174878 15:02:2011 09:24:31
3074 3074 87(10) 87(10) 87 100.00 109 Unknown(Unknown) 174970 15:02:2011 09:21:19
3074 3074 87(10) 87(10) 87 100.00 109 Female(Unknown) 174860 15:02:2011 09:16:32
3163 3163 90(7) 90(7) 90 100.00 176 Unknown(Unknown) 173382 09:02:2011 09:54:48
3163 3163 90(7) 90(7) 90 100.00 176 Unknown(Unknown) 173284 09:02:2011 09:51:02
CHP-212 CHP-212 94(3) 94(3) 94 100.00 269 Unknown(Unknown) 173382 09:02:2011 09:54:48
CHP-212 CHP-212 94(3) 94(3) 94 100.00 269 Unknown(Unknown) 173284 09:02:2011 09:51:02
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Male(Unknown) 200943 01:09:2011 10:48:18
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Unknown(Unknown) 200944 25:08:2011 10:20:16
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Unknown(Unknown) 200945 25:08:2011 10:19:05
MGH_2631 MGH_2631 90(8) 90(8) 90 100.00 211 Male(Unknown) 200946 25:08:2011 10:17:26
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Male(Unknown) 200943 01:09:2011 10:48:18
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Unknown(Unknown) 200944 25:08:2011 10:20:16
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Unknown(Unknown) 200945 25:08:2011 10:19:05
MGH_2101 MGH_2101 80(18) 80(18) 80 100.00 359 Male(Unknown) 200946 25:08:2011 10:17:26
PD4294c PD4294c 95(2) 95(2) 95 100.00 221 Unknown(Unknown) 179502 23:03:2011 10:03:23
PD4294c PD4294c 95(2) 95(2) 95 100.00 221 Unknown(Unknown) 179470 23:03:2011 10:02:30
精彩评论