开发者

Updating a row in a data file with values from another row

I have some data in tab delimited form that gives t开发者_如何学Gohe result of device identification from user-agents (UAs). but there are several rows where the devices are wrongly identified and I need to change them to the correct ones.

For instance there are cases when and iphone or htc wildfire UA is identified as another phone. So for there cases I need to update the device information with the correct device by searching for certain keywords in the UA. for example,

781 Mozilla/5.0 (Linux; U; Android 2.1-update1; fi-fi; HTC_Wildfire_A3333 Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17  htc_wildfire_ver1_suba3333    HTC       Wildfire    Android

this is correct but a similar case is wrong

775 Mozilla/5.0 (Linux; U; Android 2.1-update1; fi-fi; HTC Wildfire Build/ERE27) AppleWebKit/525.10+ (KHTML, like Gecko) Version/3.0.4 Mobile Safari/523.12.2 (AdMob-ANDROID-20100709)   T-Mobile       Pulse   Android

So, I have to do something like this. I know that if the UA column contains the term HTC and Wildfire it is that phone. So, I want to look for all the UAs that have the strings HTC and Wildfire but the columns 3 and 4 (manufucturer and model) are wrong and then update them with the correct device information from row 781 which I know is correct. I would manually put in the code that row 781 is correct and if the device is not correctly identified I would put the info from column 3 onwards of row 781 for all these cases.

Of course this is one case and there are several cases like this and I would repeat the same logic for each of them. Also there are other columns besides these four that I've not shown.

how would i accomplish this in a perl script (preferably, but a bash solution is also ok).


  1. Create a file (devices) with all distinct (UA, Manufacturer, Model) triples by looping over the input file, storing the triple as keys in a hash; write sorted keys into devices
  2. Manually edit devices (delete 'wrong' lines)
  3. Load devices into a hash, use UA as key, (Manufacturer, Model) as value. Loop over the input file, use UA field of current line to lookup the device, change both fields using the good value from the hash (if necessary).

    my @Log = (
        [ 'HTC', 'badModelHTC'  ]
      , [ 'ABC', 'badModelABC' ]
      , [ 'HTC', 'goodModelHTC' ]
      , [ 'ABC', 'badModelABC' ]
      , [ 'ABC', 'goodModelABC' ]
      , [ 'HTC', 'goodModelHTC' ]
      , [ 'ABC', 'badModelABC' ]
    );
    my %Devs;
    printf "----------- Log org\n";
    for (@Log) {
      printf "%s %s\n", @{$_};
      my $key = join '-', @{$_};
      $Devs{ $key } = $_->[ 1 ];
    }
    printf "----------- Devs org\n";
    for (sort( keys( %Devs ) )) {
      printf "%s => %s\n", $_, $Devs{ $_ };
      if (/bad/) {
          delete $Devs{ $_ };  # fake manual removal
      }
    }
    # fake manual shortening of keys
    my %Tmp = %Devs;
    %Devs = ();
    for (keys %Tmp) {
      $Devs{ (split( /-/, $_))[ 0 ] } = $Tmp{ $_ };
    }
    printf "----------- Devs corrected\n";
    for (sort( keys( %Devs ) )) {
      printf "%s => %s\n", $_, $Devs{ $_ };
    }
    printf "----------- Log corrected\n";
    for (@Log) {
      $_->[ 1 ] = $Devs{ $_->[ 0 ] };
      printf "%s %s\n", @{$_};
    }

output:

    ----------- Log org
    HTC badModelHTC
    ABC badModelABC
    HTC goodModelHTC
    ABC badModelABC
    ABC goodModelABC
    HTC goodModelHTC
    ABC badModelABC
    ----------- Devs org
    ABC-badModelABC => badModelABC
    ABC-goodModelABC => goodModelABC
    HTC-badModelHTC => badModelHTC
    HTC-goodModelHTC => goodModelHTC
    ----------- Devs corrected
    ABC => goodModelABC
    HTC => goodModelHTC
    ----------- Log corrected
    HTC goodModelHTC
    ABC goodModelABC
    HTC goodModelHTC
    ABC goodModelABC
    ABC goodModelABC
    HTC goodModelHTC
    ABC goodModelABC
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜