{"id":2085,"date":"2016-05-01T20:44:24","date_gmt":"2016-05-01T20:44:24","guid":{"rendered":"http:\/\/www.levibrackman.com\/?p=2085"},"modified":"2016-05-01T20:44:24","modified_gmt":"2016-05-01T20:44:24","slug":"function-r-word-line-count-table","status":"publish","type":"post","link":"https:\/\/www.levibrackman.com\/?p=2085","title":{"rendered":"Function in R for Word and Line Count Table"},"content":{"rendered":"<p>Here I present a new function I created to find the count of lines and words in a text document and return them in the form of a table. It uses the <strong><em>wc<\/em><\/strong> &#8220;qdap&#8221; package in R as well as base R functions <em><strong>sum<\/strong><\/em>, <strong><em>nrow<\/em><\/strong>, <em><strong>as.numeric<\/strong><\/em>,\u00a0<em><strong>as.data.frame<\/strong><\/em> and <em><strong>cbind<\/strong><\/em>.<\/p>\n<p><strong>The Problem:<\/strong><\/p>\n<p>How to find both the number of lines and the amount of words in a potentially large document using R and return it as a table&#8221;<\/p>\n<p><strong>The solution:<\/strong><\/p>\n<p>First install and load qdap package<\/p>\n<p>[code lang=&#8221;r&#8221;]install.packages(&quot;qdap&quot;);library(qdap)[\/code]<\/p>\n<p><strong>Load text document<\/strong><\/p>\n<p>[code lang=&#8221;r&#8221;]doc = readLines(&quot;doc.txt&quot;, ok = TRUE)[\/code]<\/p>\n<p><strong>Read &#8220;WordsLines&#8221; in Function<\/strong><\/p>\n<p>[code lang=&#8221;r&#8221;]<br \/>\nWordsLines = function(dataframe, names1, names2){<br \/>\nWords = as.data.frame(dataframe) #since the dataframe is in text format put it into a dataframe<br \/>\nWc = wc(Words[,1]) #get the word count of each input (all rows) of the first column<br \/>\nWords1 = as.data.frame(Wc) #put that word count into a dataframe<br \/>\nWords1$Wc = as.numeric(Words1$Wc) #make sure it is numeric<br \/>\nnames(Words1)[1] = paste(&quot;Words&quot;) #change the column name to &quot;Words&quot;<br \/>\nWords1 = sum(Words1, na.rm = T) #Sum all the word counts of the entire column<br \/>\nLines = nrow(Words) #find the number of words in the entire dataframe<br \/>\nfinal = cbind(Lines, Words1) #combine the line count and wort count into one table<br \/>\ncolnames(final) = c(names1, names2) #change the names of the columns to fit the particular dataset<br \/>\nfinal #return the table<br \/>\n}<br \/>\n[\/code]<\/p>\n<p><strong>Call function<\/strong><\/p>\n<p>[code lang=&#8221;r&#8221;]WordsLines(doc, &quot;Doc Lines&quot;, &quot;Doc Words&quot;)[\/code]<\/p>\n<p><strong>Should return something like this:<\/strong><\/p>\n<p>[code lang=&#8221;r&#8221;]<br \/>\n    Doc Lines Doc Words<br \/>\n[1,] 1010242 33482314[\/code]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here I present a new function I created to find the count of lines and words in a text document and return them in the form of a table. It uses the wc &#8220;qdap&#8221; package in R as well as base R functions sum, nrow, as.numeric,\u00a0as.data.frame and cbind. The Problem:&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[52,54,53,51],"tags":[55],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=\/wp\/v2\/posts\/2085"}],"collection":[{"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2085"}],"version-history":[{"count":5,"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=\/wp\/v2\/posts\/2085\/revisions"}],"predecessor-version":[{"id":2090,"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=\/wp\/v2\/posts\/2085\/revisions\/2090"}],"wp:attachment":[{"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2085"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2085"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.levibrackman.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}