Thursday, 8 August 2013

Creating a list of every word from a text file without spaces, punctuation

Creating a list of every word from a text file without spaces, punctuation

I have a long text file (a screenplay). I want to turn this text file into
a list (where every word is separated) so that I can search through it
later on.
The code i have at the moment is
file = open('screenplay.txt', 'r')
words = list(file.read().split())
print words
I think this works to split up all the words into a list, however I'm
having trouble removing all the extra stuff like commas and periods at the
end of words. I also want to make capital letters lower case (because I
want to be able to search in lower case and have both capitalized and
lower case words show up). Any help would be fantastic :)

No comments:

Post a Comment