Data Manipulation: Stemming from a inability to select lists
I am very new to python with no real prior programing knowledge. At my current job I am being asked to take data in the form of text from about 500+ files and plot them out. I understand the plotting to a deg开发者_运维百科ree, but I cannot seem to figure out how to manipulate the data in a way that it is easy to select specific sections. Currently this is what I have for opening a file:
fp=open("file")
for line in fp:
words = line.strip().split()
print words
The result is it gives me a list for each line of the file, but I can only access the last line made. Does any one know a way that would allow me to choose different variations of lists? Thanks a lot!!
The easiest way to get a list of lines from a file is as follows:
with open('file', 'r') as f:
lines = f.readlines()
Now you can split those lines or do whatever you want with them:
lines = [line.split() for line in lines]
I'm not certain that answers your question -- let me know if you have something more specific in mind.
Since I don't understand exactly what you are asking, here are a few more examples of how you might process a text file. You can experiment with these in the interactive interpreter, which you can generally access just by typing 'python' at the command line.
>>> with open('a_text_file.txt', 'r') as f:
... text = f.read()
...
>>> text
'the first line of the text file\nthe second line -- broken by a symbol\nthe third line of the text file\nsome other data\n'
That's the raw, unprocessed text of the file. It's a string. Strings are immutable -- they can't be altered -- but they can be copied in part or in whole.
>>> text.splitlines()
['the first line of the text file', 'the second line -- broken by a symbol', 'the third line of the text file', 'some other data']
splitlines
is a string method. splitlines
splits the string wherever it finds a \n
(newline) character; it then returns a list containing copies of the separate sections of the string.
>>> lines = text.splitlines()
Here I've just saved the above list of lines to a new variable name.
>>> lines[0]
'the first line of the text file'
Lists are accessed by indexing. Just provide an integer from 0
to len(lines) - 1
and the corresponding line is returned.
>>> lines[2]
'the third line of the text file'
>>> lines[1]
'the second line -- broken by a symbol'
Now you can start to manipulate individual lines.
>>> lines[1].split('--')
['the second line ', ' broken by a symbol']
split
is another string method. It's like splitlines
but you can specify the character or string that you want to use as the demarcator.
>>> lines[1][4]
's'
You can also index the characters in a string.
>>> lines[1][4:10]
'second'
You can also "slice" a string. The result is a copy of characters 4 through 9. 10 is the stop value, so the 10th character isn't included in the slice. (You can slice lists too.)
>>> lines[1].index('broken')
19
If you want to find a substring within a string, one way is to use index
. It returns the index at which the first occurrence of the substring appears. (It throws an error if the substring isn't in the string. If you don't want that, use find
, which returns a -1 if the substring isn't in the string.)
>>> lines[1][19:]
'broken by a symbol'
Then you can use that to slice the string. If you don't provide a stop index, it just returns the remainder of the string.
>>> lines[1][:19]
'the second line -- '
If you don't provide a start index, it returns the beginning of the string and stops at the stop index.
>>> [line for line in text.splitlines() if 'line' in line]
['the first line of the text file', 'the second line -- broken by a symbol', 'the third line of the text file']
You can also use in
-- it's a boolean operation that returns True
if a substring is in a string. In this case, I've used a list comprehension to get only the lines that have 'line'
in them. (Note that the last line is missing from the list. It has been filtered.)
Let me know if you have any more questions.
精彩评论