In Python, how do I read a file line-by-line into a list?
Clash Royale CLAN TAG#URR8PPP
In Python, how do I read a file line-by-line into a list?
How do I read every line of a file in Python and store each line as an element in a list?
I want to read the file line by line and append each line to the end of the list.
34 Answers
34
with open(fname) as f:
content = f.readlines()
# you may also want to remove whitespace characters like `n` at the end of each line
content = [x.strip() for x in content]
file.readlines()
for
lines = [line.rstrip('n') for line in file]
In the case you are working with Big Data using
readlines()
is not very efficient as it can result in MemoryError. In this case it is better to iterate over the file using for line in f:
and working with each line
variable.– DarkCygnus
Aug 27 '16 at 3:07
readlines()
for line in f:
line
I checked the memory profile of different ways given in the answers using the procedure mentioned here. The memory usage is far better when each line is read from the file and processed, as suggested by @DevShark here. Holding all lines in a collection object is not a good idea if memory is a constraint or the file is large. The execution time is similar in both the approaches.
– Tirtha R
Mar 2 at 23:24
Also,
.rstrip()
will work slightly faster if you are stripping whitespace from the ends of lines.– Gringo Suave
Jun 15 at 19:14
.rstrip()
See Input and Ouput:
with open('filename') as f:
lines = f.readlines()
or with stripping the newline character:
lines = [line.rstrip('n') for line in open('filename')]
Editor's note: This answer's original whitespace-stripping command, line.strip()
, as implied by Janus Troelsen's comment, would remove all leading and trailing whitespace, not just the trailing n
.
line.strip()
n
if you only want to discard the newline:
lines = (line.rstrip('n') for line in open(filename))
– Janus Troelsen
Oct 11 '12 at 10:14
lines = (line.rstrip('n') for line in open(filename))
For a list it should be
lines = [line.rstrip('n') for line in open(filename)]
– Lazik
Oct 12 '13 at 14:32
lines = [line.rstrip('n') for line in open(filename)]
Won't the 2nd option leave the file open (since it's not guarded by a context on its own)?
– yo'
Feb 8 '15 at 19:14
@yo' It does, but most people do not care about that in small programs. There is no harm in small programs since the leaked file object are garbage collected, but it is not a good habit to do this.
– Martin Ueding
May 4 '15 at 8:01
Safer:
with open('filename') as f: lines = [line.rstrip('n') for line in f]
– becko
Feb 10 '16 at 16:36
with open('filename') as f: lines = [line.rstrip('n') for line in f]
This is more explicit than necessary, but does what you want.
with open("file.txt", "r") as ins:
array =
for line in ins:
array.append(line)
This is a direct answer to the question
– Joop
Jul 14 at 9:15
This will yield an "array" of lines from the file.
lines = tuple(open(filename, 'r'))
open
returns a file which can be iterated over. When you iterate over a file, you get the lines from that file. tuple
can take an iterator and instantiate a tuple instance for you from the iterator that you give it. lines
is a tuple created from the lines of the file.– Noctis Skytower
Jan 5 '14 at 21:58
open
tuple
lines
@MarshallFarrier Try
lines = open(filename).read().split('n')
instead.– Noctis Skytower
Dec 11 '14 at 13:56
lines = open(filename).read().split('n')
does it close the file?
– Vanuan
Jan 3 '15 at 2:21
@NoctisSkytower I find
lines = open(filename).read().splitlines()
a little cleaner, and I believe it also handles DOS line endings better.– jaynp
May 13 '15 at 5:59
lines = open(filename).read().splitlines()
@mklement0 Assuming a file of 1000 lines, a
list
takes up about 13.22% more space than a tuple
. Results come from from sys import getsizeof as g; i = [None] * 1000; round((g(list(i)) / g(tuple(i)) - 1) * 100, 2)
. Creating a tuple
takes about 4.17% more time than creating a list
(with a 0.16% standard deviation). Results come from running from timeit import timeit as t; round((t('tuple(i)', 'i = [None] * 1000') / t('list(i)', 'i = [None] * 1000') - 1) * 100, 2)
30 times. My solution favors space over speed when the need for mutability is unknown.– Noctis Skytower
Jan 4 '16 at 16:17
list
tuple
from sys import getsizeof as g; i = [None] * 1000; round((g(list(i)) / g(tuple(i)) - 1) * 100, 2)
tuple
list
from timeit import timeit as t; round((t('tuple(i)', 'i = [None] * 1000') / t('list(i)', 'i = [None] * 1000') - 1) * 100, 2)
If you want the n
included:
n
with open(fname) as f:
content = f.readlines()
If you do not want n
included:
n
with open(fname) as f:
content = f.read().splitlines()
You could simply do the following, as has been suggested:
with open('/your/path/file') as f:
my_lines = f.readlines()
Note that this approach has 2 downsides:
1) You store all the lines in memory. In the general case, this is a very bad idea. The file could be very large, and you could run out of memory. Even if it's not large, it is simply a waste of memory.
2) This does not allow processing of each line as you read them. So if you process your lines after this, it is not efficient (requires two passes rather than one).
A better approach for the general case would be the following:
with open('/your/path/file') as f:
for line in f:
process(line)
Where you define your process function any way you want. For example:
def process(line):
if 'save the world' in line.lower():
superman.save_the_world()
(The implementation of the Superman
class is left as an exercise for you).
Superman
This will work nicely for any file size and you go through your file in just 1 pass. This is typically how generic parsers will work.
This was exactly what I needed - and thanks for explaining the downsides. As a beginner in Python, it's awesome to understand why a solution is the solution. Cheers!
– Ephexx
May 17 '16 at 21:37
Think a bit more Corey. Do you really ever want your computer to read each line, without ever doing anything with these lines? Surely you can realize you always need to process them one way or another.
– DevShark
Dec 13 '16 at 7:31
You always need to do something with the lines. It can be as simple as printing the lines, or counting them. There is no value in having your process read the lines in memory, but not doing anything with it.
– DevShark
Dec 14 '16 at 10:22
You always need to do something with them. I think the point you are trying to make is that you might want to apply a function to all of them at once, rather than one by one. That is indeed the case sometimes. But it is very inefficient from a memory standpoint to do so, and prevents you from reading files if its footprint is larger than your Ram. That's why typically generic parsers operate in the way I described.
– DevShark
Jun 23 '17 at 19:40
@PierreOcinom that is correct. Given that the file is opened in read only mode, you couldn't modify the original file with the code above. To open a file for both reading and writing, use
open('file_path', 'r+')
– DevShark
Sep 14 '17 at 9:17
open('file_path', 'r+')
If you don't care about closing the file, this one-liner works:
lines = open('file.txt').read().split("n")
The traditional way:
fp = open('file.txt') # Open file on read mode
lines = fp.read().split("n") # Create a list containing all lines
fp.close() # Close file
Using with
(recommended):
with
with open('file.txt') as fp:
lines = fp.read().split("n")
It might be fine in some cases, but this doesn't close the file, even after the loop has completed - stackoverflow.com/a/1832589/232593
– Merlyn Morgan-Graham
Dec 11 '15 at 0:02
The
with
block closes the file automatically. No need for the final fp.close()
line in that last example. See: repl.it/IMeA/0– Merlyn Morgan-Graham
May 23 '17 at 8:57
with
fp.close()
Always care about closing the file! Be a good resource citizen!
– Nick
Jun 29 at 14:38
This should encapsulate the open command.
array =
with open("file.txt", "r") as f:
for line in f:
array.append(line)
f.readlines() does the same. no need to append to an empty list.
– Corey Goldberg
Mar 6 '15 at 14:51
You are right. This provides insight into a solution if you want to do something while you are reading in the lines. Like some strip/regex transformation.
– cevaris
Mar 6 '15 at 15:44
Data into list
Assume that we have a text file with our data like in the following lines:
line 1
line 2
line 3
python
>>> with open("myfile.txt", encoding="utf-8") as file:
... x = [l.strip() for l in file]
>>> x
['line 1','line 2','line 3']
x =
with open("myfile.txt") as file:
for l in file:
x.append(l.strip())
>>> x = open("myfile.txt").read().splitlines()
>>> x
['line 1', 'line 2', 'line 3']
>>> y = [x.rstrip() for x in open("my_file.txt")]
>>> y
['line 1','line 2','line 3']
is the
encoding="utf-8"
required?– Mausy5043
Jun 3 at 8:53
encoding="utf-8"
@Mausy5043 no, but when you read a text file, you can have some strange character (expecially in italian)
– Giovanni Gianni
Jun 3 at 9:55
Clean and Pythonic Way of Reading the Lines of a File Into a List
First and foremost, you should focus on opening your file and reading its contents in an efficient and pythonic way. Here is an example of the way I personally DO NOT prefer:
infile = open('my_file.txt', 'r') # Open the file for reading.
data = infile.read() # Read the contents of the file.
infile.close() # Close the file since we're done using it.
Instead, I prefer the below method of opening files for both reading and writing as it
is very clean, and does not require an extra step of closing the file
once you are done using it. In the statement below, we're opening the file
for reading, and assigning it to the variable 'infile.' Once the code within
this statement has finished running, the file will be automatically closed.
# Open the file for reading.
with open('my_file.txt', 'r') as infile:
data = infile.read() # Read the contents of the file into memory.
Now we need to focus on bringing this data into a Python List because they are iterable, efficient, and flexible. In your case, the desired goal is to bring each line of the text file into a separate element. To accomplish this, we will use the splitlines() method as follows:
# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()
The Final Product:
# Open the file for reading.
with open('my_file.txt', 'r') as infile:
data = infile.read() # Read the contents of the file into memory.
# Return a list of the lines, breaking at line boundaries.
my_list = data.splitlines()
Testing Our Code:
A fost odatã ca-n povesti,
A fost ca niciodatã,
Din rude mãri împãrãtesti,
O prea frumoasã fatã.
print my_list # Print the list.
# Print each line in the list.
for line in my_list:
print line
# Print the fourth element in this list.
print my_list[3]
['A fost odatxc3xa3 ca-n povesti,', 'A fost ca niciodatxc3xa3,',
'Din rude mxc3xa3ri xc3xaempxc3xa3rxc3xa3testi,', 'O prea
frumoasxc3xa3 fatxc3xa3.']
A fost odatã ca-n povesti, A fost ca niciodatã, Din rude mãri
împãrãtesti, O prea frumoasã fatã.
O prea frumoasã fatã.
I'd do it like this.
lines =
with open("myfile.txt") as f:
for line in f:
lines.append(line)
To read a file into a list you need to do three things:
Fortunately Python makes it very easy to do these things so the shortest way to read a file into a list is:
lst = list(open(filename))
However I'll add some more explanation.
I assume that you want to open a specific file and you don't deal directly with a file-handle (or a file-like-handle). The most commonly used function to open a file in Python is open
, it takes one mandatory argument and two optional ones in Python 2.7:
open
The filename should be a string that represents the path to the file. For example:
open('afile') # opens the file named afile in the current working directory
open('adir/afile') # relative path (relative to the current working directory)
open('C:/users/aname/afile') # absolute path (windows)
open('/usr/local/afile') # absolute path (linux)
Note that the file extension needs to be specified. This is especially important for Windows users because file extensions like .txt
or .doc
, etc. are hidden by default when viewed in the explorer.
.txt
.doc
The second argument is the mode
, it's r
by default which means "read-only". That's exactly what you need in your case.
mode
r
But in case you actually want to create a file and/or write to a file you'll need a different argument here. There is an excellent answer if you want an overview.
For reading a file you can omit the mode
or pass it in explicitly:
mode
open(filename)
open(filename, 'r')
Both will open the file in read-only mode. In case you want to read in a binary file on Windows you need to use the mode rb
:
rb
open(filename, 'rb')
On other platforms the 'b'
(binary mode) is simply ignored.
'b'
Now that I've shown how to open
the file, let's talk about the fact that you always need to close
it again. Otherwise it will keep an open file-handle to the file until the process exits (or Python garbages the file-handle).
open
close
While you could use:
f = open(filename)
# ... do stuff with f
f.close()
That will fail to close the file when something between open
and close
throws an exception. You could avoid that by using a try
and finally
:
open
close
try
finally
f = open(filename)
# nothing in between!
try:
# do stuff with f
finally:
f.close()
However Python provides context managers that have a prettier syntax (but for open
it's almost identical to the try
and finally
above):
open
try
finally
with open(filename) as f:
# do stuff with f
# The file is always closed after the with-scope ends.
The last approach is the recommended approach to open a file in Python!
Okay, you've opened the file, now how to read it?
The open
function returns a file
object and it supports Pythons iteration protocol. Each iteration will give you a line:
open
file
with open(filename) as f:
for line in f:
print(line)
This will print each line of the file. Note however that each line will contain a newline character n
at the end (you might want to check if your Python is built with universal newlines support - otherwise you could also have rn
on Windows or r
on Mac as newlines). If you don't want that you can could simply remove the last character (or the last two characters on Windows):
n
rn
r
with open(filename) as f:
for line in f:
print(line[:-1])
But the last line doesn't necessarily has a trailing newline, so one shouldn't use that. One could check if it ends with a trailing newline and if so remove it:
with open(filename) as f:
for line in f:
if line.endswith('n'):
line = line[:-1]
print(line)
But you could simply remove all whitespaces (including the n
character) from the end of the string, this will also remove all other trailing whitespaces so you have to be careful if these are important:
n
with open(filename) as f:
for line in f:
print(f.rstrip())
However if the lines end with rn
(Windows "newlines") that .rstrip()
will also take care of the r
!
rn
.rstrip()
r
Now that you know how to open the file and read it, it's time to store the contents in a list. The simplest option would be to use the list
function:
list
with open(filename) as f:
lst = list(f)
In case you want to strip the trailing newlines you could use a list comprehension instead:
with open(filename) as f:
lst = [line.rstrip() for line in f]
Or even simpler: The .readlines()
method of the file
object by default returns a list
of the lines:
.readlines()
file
list
with open(filename) as f:
lst = f.readlines()
This will also include the trailing newline characters, if you don't want them I would recommend the [line.rstrip() for line in f]
approach because it avoids keeping two lists containing all the lines in memory.
[line.rstrip() for line in f]
There's an additional option to get the desired output, however it's rather "suboptimal": read
the complete file in a string and then split on newlines:
read
with open(filename) as f:
lst = f.read().split('n')
or:
with open(filename) as f:
lst = f.read().splitlines()
These take care of the trailing newlines automatically because the split
character isn't included. However they are not ideal because you keep the file as string and as a list of lines in memory!
split
with open(...) as f
file
for line in the_file_object:
readlines()
Here's one more option by using list comprehensions on files;
lines = [line.rstrip() for line in open('file.txt')]
This should be more efficient way as the most of the work is done inside the Python interpreter.
rstrip()
potentially strips all trailing whitespace, not just the n
; use .rstrip('n')
.– mklement0
May 22 '15 at 16:39
rstrip()
n
.rstrip('n')
Another option is numpy.genfromtxt
, for example:
numpy.genfromtxt
import numpy as np
data = np.genfromtxt("yourfile.dat",delimiter="n")
This will make data
a NumPy array with as many rows as are in your file.
data
If you'd like to read a file from the command line or from stdin, you can also use the fileinput
module:
fileinput
# reader.py
import fileinput
content =
for line in fileinput.input():
content.append(line.strip())
fileinput.close()
Pass files to it like so:
$ python reader.py textfile.txt
Read more here: http://docs.python.org/2/library/fileinput.html
The simplest way to do it
A simple way is to:
In one line, that would give:
lines = open('C:/path/file.txt').read().splitlines()
f = open("your_file.txt",'r')
out = f.readlines() # will append in the list out
Now variable out is a list (array) of what you want. You could either do:
for line in out:
print line
or
for line in f:
print line
you'll get the same results.
Just use the splitlines() functions. Here is an example.
inp = "file.txt"
data = open(inp)
dat = data.read()
lst = dat.splitlines()
print lst
# print(lst) # for python 3
In the output you will have the list of lines.
A real easy way:
with open(file) as g:
stuff = g.readlines()
If you want to make it a fully-fledged program, type this in:
file = raw_input ("Enter EXACT file name: ")
with open(file) as g:
stuff = g.readlines()
print (stuff)
exit = raw_input("Press enter when you are done.")
For some reason, it doesn't read .py files properly.
Read and write text files with Python 2 and Python 3; it works with Unicode
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Define data
lines = [' A first string ',
'A Unicode sample: €',
'German: äöüß']
# Write text file
with open('file.txt', 'w') as fp:
fp.write('n'.join(lines))
# Read text file
with open('file.txt', 'r') as fp:
read_lines = fp.readlines()
read_lines = [line.rstrip('n') for line in read_lines]
print(lines == read_lines)
Things to notice:
with
.strip()
.rstrip()
lines
.txt
.txt
For your application, the following might be important:
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python.
You can just open your file for reading using:
file1 = open("filename","r")
# And for reading use
lines = file1.readlines()
file1.close()
The list lines
will contain all your lines as individual elements, and you can call a specific element using lines["linenumber-1"]
as Python starts its counting from 0.
lines
lines["linenumber-1"]
If you want to are faced with a very large / huge file and want to read faster (imagine you are in a Topcoder/Hackerrank coding competition), you might read a considerably bigger chunk of lines into a memory buffer at one time, rather than just iterate line by line at file level.
buffersize = 2**16
with open(path) as f:
while True:
lines_buffer = f.readlines(buffersize)
if not lines_buffer:
break
for line in lines_buffer:
process(line)
what does process(line) do? I get an error that there is not such variable defined. I guess something needs importing and I tried to import multiprocessing.Process, but that's not it I guess. Could you please elaborate? Thanks
– Newskooler
Apr 6 '17 at 8:40
process(line)
is a function that you need to implement to process the data. for example, instead of that line, if you use print(line)
, it will print each line from the lines_buffer.– Khanal
Apr 26 '17 at 13:27
process(line)
print(line)
f.readlines(buffersize) returns an immutable buffer. if you want to directly read into your buffer you need to use readinto() function. I will be much faster.
– David Dehghan
Jun 30 at 10:28
To my knowledge Python doesn't have a native array data structure. But it does support the list data structure which is much simpler to use than an array.
array = #declaring a list with name '**array**'
with open(PATH,'r') as reader :
for line in reader :
array.append(line)
python does have an array (see the standard library's
array
module), but the question asked for a list.– Corey Goldberg
Dec 12 '16 at 23:22
array
Use this:
import pandas as pd
data = pd.read_csv(filename) # You can also add parameters such as header, sep, etc.
array = data.values
data
is a dataframe type, and uses values to get ndarray. You can also get a list by using array.tolist()
.
data
array.tolist()
You can easily do it by the following piece of code:
lines = open(filePath).readlines()
Introduced in Python 3.4, pathlib
has a really convenient method for reading in text from files, as follows:
pathlib
from pathlib import Path
p = Path('my_text_file')
lines = p.read_text().splitlines()
(The splitlines
call is what turns it from a string containing the whole contents of the file to a list of lines in the file).
splitlines
pathlib
has a lot of handy conveniences in it. read_text
is nice and concise, and you don't have to worry about opening and closing the file. If all you need to do with the file is read it all in in one go, it's a good choice.
pathlib
read_text
You could also use the loadtxt command in NumPy. This checks for fewer conditions than genfromtxt, so it may be faster.
import numpy
data = numpy.loadtxt(filename, delimiter="n")
#!/bin/python3
import os
import sys
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
filename = dname + sys.argv[1]
arr = open(filename).read().split("n")
print(arr)
python3 somefile.py input_file_name.txt
I like to use the following. Reading the lines immediately.
contents =
for line in open(filepath, 'r').readlines():
contents.append(line.strip())
Or using list comprehension:
contents = [line.strip() for line in open(filepath, 'r').readlines()]
With a filename
, handling the file from a Path(filename)
object, or directly with open(filename) as f
, do one of the following:
filename
Path(filename)
open(filename) as f
list(fileinput.input(filename))
with path.open() as f
f.readlines()
list(f)
path.read_text().splitlines()
path.read_text().splitlines(keepends=True)
fileinput.input
f
list.append
f
list.extend
f
I explain the use-case for each below.
This is an excellent question. First, let's create some sample data:
from pathlib import Path
Path('filename').write_text('foonbarnbaz')
File objects are lazy iterators, so just iterate over it.
filename = 'filename'
with open(filename) as f:
for line in f:
line # do something with the line
Alternatively, if you have multiple files, use fileinput.input
, another lazy iterator. With just one file:
fileinput.input
import fileinput
for line in fileinput.input(filename):
line # process the line
or for multiple files, pass it a list of filenames:
for line in fileinput.input([filename]*2):
line # process the line
Again, f
and fileinput.input
above both are/return lazy iterators.
You can only use an iterator one time, so to provide functional code while avoiding verbosity I'll use the slightly more terse fileinput.input(filename)
where apropos from here.
f
fileinput.input
fileinput.input(filename)
Ah but you want it in a list for some reason? I'd avoid that if possible. But if you insist... just pass the result of fileinput.input(filename)
to list
:
fileinput.input(filename)
list
list(fileinput.input(filename))
Another direct answer is to call f.readlines
, which returns the contents of the file (up to an optional hint
number of characters, so you could break this up into multiple lists that way).
f.readlines
hint
You can get to this file object two ways. One way is to pass the filename to the open
builtin:
open
filename = 'filename'
with open(filename) as f:
f.readlines()
or using the new Path object from the pathlib
module (which I have become quite fond of, and will use from here on):
pathlib
from pathlib import Path
path = Path(filename)
with path.open() as f:
f.readlines()
list
will also consume the file iterator and return a list - a quite direct method as well:
list
with path.open() as f:
list(f)
If you don't mind reading the entire text into memory as a single string before splitting it, you can do this as a one-liner with the Path
object and the splitlines()
string method. By default, splitlines
removes the newlines:
Path
splitlines()
splitlines
path.read_text().splitlines()
If you want to keep the newlines, pass keepends=True
:
keepends=True
path.read_text().splitlines(keepends=True)
I want to read the file line by line and append each line to the end of the list.
Now this is a bit silly to ask for, given that we've demonstrated the end result easily with several methods. But you might need to filter or operate on the lines as you make your list, so let's humor this request.
Using list.append
would allow you to filter or operate on each line before you append it:
list.append
line_list =
for line in fileinput.input(filename):
line_list.append(line)
line_list
Using list.extend
would be a bit more direct, and perhaps useful if you have a preexisting list:
list.extend
line_list =
line_list.extend(fileinput.input(filename))
line_list
Or more idiomatically, we could instead use a list comprehension, and map and filter inside it if desirable:
[line for line in fileinput.input(filename)]
Or even more directly, to close the circle, just pass it to list to create a new list directly without operating on the lines:
list(fileinput.input(filename))
You've seen many ways to get lines from a file into a list, but I'd recommend you avoid materializing large quantities of data into a list and instead use Python's lazy iteration to process the data if possible.
That is, prefer fileinput.input
or with path.open() as f
.
fileinput.input
with path.open() as f
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
Don't use
file.readlines()
in afor
-loop, a file object itself is enough:lines = [line.rstrip('n') for line in file]
– jfs
Jan 14 '15 at 10:52