Extracting line after new line

2023-03-14 09:42 问答作者：

I have a text file with something like

Country1
city1
city2

Country2
city3
city4

I want to separate country and cities. Is there any quick way of doing it? I am thinking of some fil开发者_Python百科e handling and then extracting to different files, is it best way or can be done with some regex etc quickly?

countries=[]
cities=[]
with open("countries.txt") as f:
    gap=True
    for line in f:
        line=line.strip()
        if gap:
            countries.append(line)
            gap=False
        elif line=="":
            gap=True
        else:
            cities.append(line)
print countries
print cities

output:

['Country1', 'Country2']
['city1', 'city2', 'city3', 'city4']

if you want to write these to files:

with open("countries.txt","w") as country_file, open("cities.txt","w") as city_file:
    country_file.write("\n".join(countries))
    city_file.write("\n".join(cities))

f = open('b.txt', 'r')
status = True
country = []
city = []
for line in f:
    line = line.strip('\n').strip()
    if line:
        if status:
            country.append(line)
            status = False
        else:
            city.append(line)
    else:
        status = True

print country
print city


output :

>>['city1', 'city2', 'city3', 'city4']
>>['Country1', 'Country2']

$countries = array();
$cities = array();
$gap = false;
$file = file('path/to/file');
foreach($file as $line)
{
  if($line == '') $gap = true;
  elseif ($line != '' and $gap) 
  {
    $countries[] = $line;
    $gap = false;
  }
  elseif ($line != '' and !$gap) $cities[] = $line;
}

Depending on how regular your file is, it may be this simple in python:

with open('inputfile.txt') as fh:
  # To iterate over the entire file.
  for country in fh:
    cityLines = [next(fh) for _i in range(2)]

    # read a blank line to advance countries.
    next(fh)

That's not likely to be exactly right, because I imagine many countries have variable numbers of cities. You could modify it like so to address that:

with open('inputfile.txt') as fh:
  # To iterate over the entire file.
  for country in fh:
    # we assume here that each country has at least 1 city.
      cities = [next(fh).strip()]

      while cities[-1]: # will continue until we encounter a blank line.
        cities.append(next(fh).strip())

That doesn't do anything to put the data into an output file, or store it much past the file handle itself, but it's a start. You really should choose a language for your questions though. A lot of the time until

Another PHP example that doesn't read the entire file in an array.

<?php

$fh = fopen('countries.txt', 'r');

$countries = array();
$cities = array();

while ( $data = fgets($fh) )
{
  // If $country is empty (or not defined), the this line is a country.
  if ( ! isset($country) )
  {
    $country = trim($data);
    $countries[] = $country;
  }
  // If an empty line is found, unset $country.
  elseif ( ! trim($data) )
    unset($country);
  // City
  else
    $cities[$country][] = trim($data);
}

fclose($fh);

The $countries array will contain a list of countries while the $cities array will contain a list of cities by countries.

Is there some pattern that distinguishes countries from cities? Or is it that the first line after a blank line is a country and all subsequent lines are city names until the next blank line? Alternatively are you finding countries based on a look-up table (a "dictionary" in Python; an associative array in PHP; a hash in Perl --- one that includes all the officially recognized countries)?

Is it safe to assume that there are no cities whose names collide with any country? Is there a France, Iowa, USA, or the old Usa, Japan?

What do you want to do with these after you separate them? You mention "some file handling and then extracting to different files" --- are you thinking of something like one file per country containing a list of all the cities therein? Or one directory per country and one file per city?

The obvious approach would be to iterate over the file, line by line, and maintain a little state machine: empty (beginning of file, blank lines between countries?) during which you enter the "country" state (whenever you found any pattern that matches whatever criteria means you've encountered the name of a country). Once you've found a country name then you're in the city loading state. I would create a dictionary using country names as keys and set of cities as cities (though perhaps you might really need county/provice, city name tuples in cases where a country has multiple cities by the same name: Portland, Maine vs. Portland, Oregon, for example). You can also have some "error" state if the contents of your file lead to some sort of ambiguity (city names before you've determined a country, two country names in a row, whatever).

It's hard to suggest a good fragment of code given how vague your spec. here is.

Not sure that this would help, but you can try to use the following code to get dictionary and then work with it(write to files, compare and etc):

res = {}
with open('c:\\tst.txt') as f:
    lines = f.readlines()
    for i,line in enumerate(lines):
        line = line.strip()
        if (i == 0 and line):
            key = line
            res[key] = []
        elif not line and i+1 < len(lines):
            key = lines[i+1].strip()
            res[key] = []
        elif line and line != key:
            res[key].append(line)
print res

This regex would work for your example:

/(?:^|\r\r)(.+?)\r(.+?)(?=\r\r|$)/s

Catches countries in group 1 and cities in group 2. You may have to adjust your newline characters, depending on your system. They can be \n, \r or \r\n. edit: added a $ sign, so you don't need two linebreaks at the end. You will need the flag for dotall for the regex to work as expected.

Print fild 1 with awk - countries

awk 'BEGIN {RS="";FS="\n"} {print $1 > "countries"} {for (i=2;i<=NF;i++) print $i > "cities"}' source.txt

继续阅读：parsing php python regex vim

Extracting line after new line

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？