Function in R for Word and Line Count Table

Here I present a new function I created to find the count of lines and words in a text document and return them in …

Here I present a new function I created to find the count of lines and words in a text document and return them in the form of a table. It uses the wc “qdap” package in R as well as base R functions sum, nrow, as.numericas.data.frame and cbind.

The Problem:

How to find both the number of lines and the amount of words in a potentially large document using R and return it as a table”

The solution:

First install and load qdap package

install.packages("qdap");library(qdap)

Load text document

doc = readLines("doc.txt", ok = TRUE)

Read “WordsLines” in Function

WordsLines = function(dataframe, names1, names2){
Words = as.data.frame(dataframe) #since the dataframe is in text format put it into a dataframe
Wc = wc(Words[,1]) #get the word count of each input (all rows) of the first column
Words1 = as.data.frame(Wc) #put that word count into a dataframe
Words1$Wc = as.numeric(Words1$Wc) #make sure it is numeric
names(Words1)[1] = paste("Words") #change the column name to "Words"
Words1 = sum(Words1, na.rm = T) #Sum all the word counts of the entire column
Lines = nrow(Words) #find the number of words in the entire dataframe
final = cbind(Lines, Words1) #combine the line count and wort count into one table
colnames(final) = c(names1, names2) #change the names of the columns to fit the particular dataset
final #return the table
}

Call function

WordsLines(doc, "Doc Lines", "Doc Words")

Should return something like this:

    Doc Lines Doc Words
[1,] 1010242 33482314
Share, Follow and Like
LinkedIn
Facebook
Twitter
RSS
Follow by Email
Google+
http://www.levibrackman.com/2016/05/01/function-r-word-line-count-table/

Leave a Reply

Your email address will not be published. Required fields are marked *

Enjoying It? Share, Follow or Like

LinkedIn
Facebook
Twitter
RSS
Follow by Email
Google+
http://www.levibrackman.com/2016/05/01/function-r-word-line-count-table/