R notes

Command

  • && and || return a single value, not working vector wise. and this only compare as least as they can
  • & and | are working vector wise.
  • %*% matrix; inner product; use drop to remove dimension
  • %o% outer product
  • %x% kronecker product

Number

1L
Inf
NaN

Attributes

attributes()

Batch

if you want to log off

nohup R CMD BATCH code.R

Debug

  • traceback()
  • debug(foo)
  • browser()
  • recover()

Efficiency

  • measure time : system.time(), proc.time()[3]

Progress bar

total <- 20
pb <- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:total){
Sys.sleep(0.1)
setTxtProgressBar(pb, i)
}
close(pb)

To get a GUI progress bar the tkProgressBar() function from the tcltk package can used.

total <- 20
# create progress bar
pb <- tkProgressBar(title = "progress bar", min = 0,
				max = total, width = 300)

for(i in 1:total){
Sys.sleep(0.1)
setTkProgressBar(pb, i, label=paste( round(i/total*100, 0),
									"% done"))
}
close(pb)

Profiling

Rprof()
foo()
Rprof(NULL)
summaryRprof()

Parallel

Example :

  • separate runs for a cross validation
  • bootstrap runs
  • different chains of some MCMC simulation
  • NOT parts of a single MCMC chain

Namesapces

stats::ks.test

Class S3, method

dropbox: s3.pdf

apropos("^mean.")
methods(mean)
methods(class="nls")
getAnywhere(nobs.nls)
stats:::nobs.nls

str

Write a generic function

reveal <- function(x, ...) UseMethod("reveal")

Write method

reveal.default
reveal.digits

class(x) <- c("glm", "digits")
return(x)

print.class
summary.class
plot.class
coef.class

Factors

table(x)
unclass(x)

Vignette

browseVignettes(package='knitr')

Rscript

> Rscript blbl.R 'avg'

Rscrip in.R > out.txt

Rscript [options] [-e expression] file [args]

#! /usr/bin/Rscript --vanilla --default-packages=utils
args <- commandArgs(TRUE)
res <- try(install.packages(args))
if(inherits(res, "try-error")) q(status=1) else q()

#!/bin/Rscript
args <- as.numeric(commandArgs(TRUE)[1])

Read and Write

read.table
read.csv
readLines
source
dget
load ;

classes <- sapply(initial, class)
taball <- read.table("ff", colClasses=classes)

str(.Platform)
version

Scope

search()
first global env and base is last

apply

lapply: list
sapply: ~=lapply, but simplify
apply
tapply: subsets, table apply : tapply(X, INDEX, FUN, simplify=F)
mapply: multivaraite of lapply

split

split(x, f, drop=FALSE)
interaction(f1, f2) or list(f1, f2)

Plot

panel.function
plotmath

Main title for multiple subplot

par(mfrow = c(2, 2), oma = c(0, 0, 3, 0))
plot...
mtext("main title", outer = TRUE, cex = 1.5)

https://stat.ethz.ch/pipermail/r-help/2008-July/168163.html

optimization

optim
nlm
optimize

Regrx

^I think ; begin
morning$ : end
[Bb][u][Ss][hH]
[a-z][0-9][a-zA-Z]
[^?.]$ ; no end with ? or .
.  : any character
flood|fire  : or any words
^(Good|Bad) : (scope)
\. : period sign
( [Ww]\.)?  : optional
(.*) : * is repete ,even none.
+ means at least one
( +[^ ]+ +){1,5} : interval quantifiers, several words
\1, \2, : refer (**)  same one
*  is greedy for longest match
to make not greedy : (.*?)

example:

data = readLine("tfsdf.txt")
length(grep("[cC]ause: Shooting", data)
setdiff(i, j) : j-i
value= TRUE : returns the actuall elements
grepl : return logical vector

regexpr : return index of match too
substr(string, begin, end)
regmatches(r) : efficient to substr

remove objects starting with p (by LaoMeng from NewSmth):

obj_p = ls(par = "^p")
rm(list = obj_p)

sub

sub("fdsfs|fjds", "", x)
gsub : replace not only the first, but all of them
as.Date

regexec : return index and length
regmatches(data, r)

Class and Methods

library(methods)
setClass()
?Classes
?Methods
?setClass, ?setMethod, ?setGeneric

S3

methods("mean")
mean.class

S4

show
showMethods("show")

getS3method(<generic>, <class>)
getMethod(<generic>, <signature>)

setClass("polygon", representation(x="numeric", y="numeric"))
setMethod("plot", "polygon", function(x,y, ...) {
	plot(x@x, x@y, type="n", ...)
	xp <- c(x@x, x@x[1])
	yp <- c(x@y, x@y[1])
	lines(xp, yp)
	})
showMethods("plot")

R workflow

reference : Rob Hyndman

  1. every paper, book or report is a ‘project’
  2. savepdf

     savepdf <- function(file, width = 16, height = 10){
         fname <- paste("figs/", file, ".pdf", sep = "")
         pdf(fname, width = width/2.54, height = height/2.54, pointsize = 10)
         par(mgp = c(2.2, 0.45, 0), tcl = -0.4, mar = c(3.3, 3.6, 1.1, 1.1))
     }
     savepdf("fig1")
     plot(x,y)
     dev.off()
    

Reduce

To cumulatively multiple a matrix, like mat^n %*% vec use Reduce:

Reduce(f=function(v, x) mat %*% v, x=1:3, init=vec, accumulate=TRUE)

Matrix computation

square root of a matrix

library(expm)
solve(sqrtm(S))

power of matrix

mat %^% power

File

exist:

file.exists(file)

read multiple files

http://danielmarcelino.com/looping-through-files/

path = ""
out.file = ""
file.names = dir(path, pattern = ".txt")
for (i in 1:length(file.names)){
	file = read.table(...)
	out.file = rbind(out.file, file)
}
write.table(out.file, ...)

Dyn.load

is.loaded(function name)

If the function comes from Fortran, then the function name should be lowercase.

Bootstrap

library(boot)
betafun = function(data, indices, formula) {
	d = data[indices, ]
	return(lm(d[,1] ~ d[,2], data = d)$coef[2])
}
bootbet = boot(data = dat, statistic = betfun, R = 1000)
names(bootbet)
plot(bootbet)
hist(bootbet$t, breaks = 100)
boot.ci(bootobject, conf = .95, type = 'basic'/'stud'/'perc'/bca/all/norm, index = 1)

MLE

http://www.mayin.org/ajayshah/KB/R/documents/mle/mle.html

optim()
nlminb()

Difference between require and library

require is often used within functions. If the library is not found, then it would give a warning. While for library, it would give an error.

Robust Code Style

Use sig, assertive and testthat package.

Reference

see dropbox / R / notes:

  1. writing efficient and parallel code in R

Read Excel

library(gdata)
dat1 <- read.xls("haha.xls", sheet = 1)

or XLConnect recommmended:

require(XLConnect)
wb <- loadWorkbook(system.file("demoFiles/mtcars.xlsx", package = "XLConnect"))
wb <- loadWorkbook("myfile.xlsx")
lst = readWorksheet(wb, sheet = getSheets(wb))
lst = readWorksheet(wb, sheet = "Sheet1")

Subset

subset(dat, ID=='minzhao')

reshape

l <- reshape(hsb2,
varying = c("read", "write", "math", "science", "socst"),
v.names = "score",
timevar = "subj",
times = c("read", "write", "math", "science", "socst"),
new.row.names = 1:1000,
direction = "long")

Check package Version

packageVersion("twitteR")
sessionInfo()

Plot with New window

x11()

or

dev.new()

Date

reference: http://www.statmethods.net/input/dates.html, http://www.ats.ucla.edu/stat/r/faq/dates.htm, http://www.stat.berkeley.edu/classes/s133/dates.html

as.Date, strftime, ISOdatetime

days <- mydates[1] - mydates[2]

Sys.Date(): today's date
date(): current date and time

dat <- mutate(dat, newdate = as.POSIXct(strptime(paste(date, time, sep=' '), '%Y-%m-%d %H:%M:%S')))

format

Symbol Example
%d 01-31
%a Mon
%A Monday
%m 00-12
%b Jan
%B January
%y 07
%Y 2007
as.Date("01/30/2014", "%d/%m/%Y")

functions

weekdays(dat)
months(dat)
hist(rdates, "months", format = "%d %b")
range(date)
mean(date)
difftime(b2, b1, units = 'weeks')
seq(date, by = 'days', length = 10) ## by = '2 weeks'
table(format(data, '%A'))
Friday    Monday  Thursday   Tuesday Wednesday
    2         3         1         2         3
cut(date, "year")


Published

23 June 2012

Modified

4 March 2014

Tags