r notes
R notes
Command
&&
and||
return a single value, not working vector wise. and this only compare as least as they can&
and|
are working vector wise.%*%
matrix; inner product; usedrop
to remove dimension%o%
outer product%x%
kronecker product
Number
1L
Inf
NaN
Attributes
attributes()
Batch
if you want to log off
nohup R CMD BATCH code.R
Debug
traceback()
debug(foo)
browser()
recover()
Efficiency
- measure time :
system.time()
,proc.time()[3]
Progress bar
total <- 20
pb <- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:total){
Sys.sleep(0.1)
setTxtProgressBar(pb, i)
}
close(pb)
To get a GUI progress bar the tkProgressBar() function from the tcltk package can used.
total <- 20
# create progress bar
pb <- tkProgressBar(title = "progress bar", min = 0,
max = total, width = 300)
for(i in 1:total){
Sys.sleep(0.1)
setTkProgressBar(pb, i, label=paste( round(i/total*100, 0),
"% done"))
}
close(pb)
Profiling
Rprof()
foo()
Rprof(NULL)
summaryRprof()
Parallel
Example :
- separate runs for a cross validation
- bootstrap runs
- different chains of some MCMC simulation
- NOT parts of a single MCMC chain
Namesapces
stats::ks.test
Class S3, method
dropbox: s3.pdf
apropos("^mean.")
methods(mean)
methods(class="nls")
getAnywhere(nobs.nls)
stats:::nobs.nls
str
Write a generic function
reveal <- function(x, ...) UseMethod("reveal")
Write method
reveal.default
reveal.digits
class(x) <- c("glm", "digits")
return(x)
print.class
summary.class
plot.class
coef.class
Factors
table(x)
unclass(x)
Vignette
browseVignettes(package='knitr')
Rscript
> Rscript blbl.R 'avg'
Rscrip in.R > out.txt
Rscript [options] [-e expression] file [args]
#! /usr/bin/Rscript --vanilla --default-packages=utils
args <- commandArgs(TRUE)
res <- try(install.packages(args))
if(inherits(res, "try-error")) q(status=1) else q()
#!/bin/Rscript
args <- as.numeric(commandArgs(TRUE)[1])
Read and Write
read.table
read.csv
readLines
source
dget
load ;
classes <- sapply(initial, class)
taball <- read.table("ff", colClasses=classes)
str(.Platform)
version
Scope
search()
first global env and base is last
apply
lapply: list
sapply: ~=lapply, but simplify
apply
tapply: subsets, table apply : tapply(X, INDEX, FUN, simplify=F)
mapply: multivaraite of lapply
split
split(x, f, drop=FALSE)
interaction(f1, f2) or list(f1, f2)
Plot
panel.function
plotmath
Main title for multiple subplot
par(mfrow = c(2, 2), oma = c(0, 0, 3, 0))
plot...
mtext("main title", outer = TRUE, cex = 1.5)
https://stat.ethz.ch/pipermail/r-help/2008-July/168163.html
optimization
optim
nlm
optimize
Regrx
^I think ; begin
morning$ : end
[Bb][u][Ss][hH]
[a-z][0-9][a-zA-Z]
[^?.]$ ; no end with ? or .
. : any character
flood|fire : or any words
^(Good|Bad) : (scope)
\. : period sign
( [Ww]\.)? : optional
(.*) : * is repete ,even none.
+ means at least one
( +[^ ]+ +){1,5} : interval quantifiers, several words
\1, \2, : refer (**) same one
* is greedy for longest match
to make not greedy : (.*?)
example:
data = readLine("tfsdf.txt")
length(grep("[cC]ause: Shooting", data)
setdiff(i, j) : j-i
value= TRUE : returns the actuall elements
grepl : return logical vector
regexpr : return index of match too
substr(string, begin, end)
regmatches(r) : efficient to substr
remove objects starting with p
(by LaoMeng from NewSmth):
obj_p = ls(par = "^p")
rm(list = obj_p)
sub
sub("fdsfs|fjds", "", x)
gsub : replace not only the first, but all of them
as.Date
regexec : return index and length
regmatches(data, r)
Class and Methods
library(methods)
setClass()
?Classes
?Methods
?setClass, ?setMethod, ?setGeneric
S3
methods("mean")
mean.class
S4
show
showMethods("show")
getS3method(<generic>, <class>)
getMethod(<generic>, <signature>)
setClass("polygon", representation(x="numeric", y="numeric"))
setMethod("plot", "polygon", function(x,y, ...) {
plot(x@x, x@y, type="n", ...)
xp <- c(x@x, x@x[1])
yp <- c(x@y, x@y[1])
lines(xp, yp)
})
showMethods("plot")
R workflow
reference : Rob Hyndman
- every paper, book or report is a ‘project’
-
savepdf
savepdf <- function(file, width = 16, height = 10){ fname <- paste("figs/", file, ".pdf", sep = "") pdf(fname, width = width/2.54, height = height/2.54, pointsize = 10) par(mgp = c(2.2, 0.45, 0), tcl = -0.4, mar = c(3.3, 3.6, 1.1, 1.1)) } savepdf("fig1") plot(x,y) dev.off()
Reduce
To cumulatively multiple a matrix, like mat^n %*% vec
use Reduce
:
Reduce(f=function(v, x) mat %*% v, x=1:3, init=vec, accumulate=TRUE)
Matrix computation
square root of a matrix
library(expm)
solve(sqrtm(S))
power of matrix
mat %^% power
File
exist:
file.exists(file)
read multiple files
http://danielmarcelino.com/looping-through-files/
path = ""
out.file = ""
file.names = dir(path, pattern = ".txt")
for (i in 1:length(file.names)){
file = read.table(...)
out.file = rbind(out.file, file)
}
write.table(out.file, ...)
Dyn.load
is.loaded(function name)
If the function comes from Fortran, then the function name should be lowercase.
Bootstrap
library(boot)
betafun = function(data, indices, formula) {
d = data[indices, ]
return(lm(d[,1] ~ d[,2], data = d)$coef[2])
}
bootbet = boot(data = dat, statistic = betfun, R = 1000)
names(bootbet)
plot(bootbet)
hist(bootbet$t, breaks = 100)
boot.ci(bootobject, conf = .95, type = 'basic'/'stud'/'perc'/bca/all/norm, index = 1)
MLE
http://www.mayin.org/ajayshah/KB/R/documents/mle/mle.html
optim()
nlminb()
Difference between require
and library
require
is often used within functions. If the library is not found, then it would give a warning. While for library
, it would give an error.
Robust Code Style
Use sig
, assertive
and testthat
package.
Reference
see dropbox / R / notes:
- writing efficient and parallel code in R
Read Excel
library(gdata)
dat1 <- read.xls("haha.xls", sheet = 1)
or XLConnect
recommmended:
require(XLConnect)
wb <- loadWorkbook(system.file("demoFiles/mtcars.xlsx", package = "XLConnect"))
wb <- loadWorkbook("myfile.xlsx")
lst = readWorksheet(wb, sheet = getSheets(wb))
lst = readWorksheet(wb, sheet = "Sheet1")
Subset
subset(dat, ID=='minzhao')
reshape
l <- reshape(hsb2,
varying = c("read", "write", "math", "science", "socst"),
v.names = "score",
timevar = "subj",
times = c("read", "write", "math", "science", "socst"),
new.row.names = 1:1000,
direction = "long")
Check package Version
packageVersion("twitteR")
sessionInfo()
Plot with New window
x11()
or
dev.new()
Date
reference: http://www.statmethods.net/input/dates.html, http://www.ats.ucla.edu/stat/r/faq/dates.htm, http://www.stat.berkeley.edu/classes/s133/dates.html
as.Date
, strftime
, ISOdatetime
days <- mydates[1] - mydates[2]
Sys.Date(): today's date
date(): current date and time
dat <- mutate(dat, newdate = as.POSIXct(strptime(paste(date, time, sep=' '), '%Y-%m-%d %H:%M:%S')))
format
Symbol | Example |
---|---|
%d | 01-31 |
%a | Mon |
%A | Monday |
%m | 00-12 |
%b | Jan |
%B | January |
%y | 07 |
%Y | 2007 |
as.Date("01/30/2014", "%d/%m/%Y")
functions
weekdays(dat)
months(dat)
hist(rdates, "months", format = "%d %b")
range(date)
mean(date)
difftime(b2, b1, units = 'weeks')
seq(date, by = 'days', length = 10) ## by = '2 weeks'
table(format(data, '%A'))
Friday Monday Thursday Tuesday Wednesday
2 3 1 2 3
cut(date, "year")