The Problem:

How to find both the number of lines and the amount of words in a potentially large document using R and return it as a table”

The solution:

First install and load qdap package

[code lang=”r”]install.packages("qdap");library(qdap)[/code]

Load text document

[code lang=”r”]doc = readLines("doc.txt", ok = TRUE)[/code]

Read “WordsLines” in Function

[code lang=”r”]
WordsLines = function(dataframe, names1, names2){
Words = #since the dataframe is in text format put it into a dataframe
Wc = wc(Words[,1]) #get the word count of each input (all rows) of the first column
Words1 = #put that word count into a dataframe
Words1$Wc = as.numeric(Words1$Wc) #make sure it is numeric
names(Words1)[1] = paste("Words") #change the column name to "Words"
Words1 = sum(Words1, na.rm = T) #Sum all the word counts of the entire column
Lines = nrow(Words) #find the number of words in the entire dataframe
final = cbind(Lines, Words1) #combine the line count and wort count into one table
colnames(final) = c(names1, names2) #change the names of the columns to fit the particular dataset
final #return the table

Call function

[code lang=”r”]WordsLines(doc, "Doc Lines", "Doc Words")[/code]

Should return something like this:

[code lang=”r”]
Doc Lines Doc Words
[1,] 1010242 33482314[/code]


