NLMIXED

METHOD

  1. DEFAULT: [GAUSS] adaptive Gauss-Hermite Quadrature
  2. [FIRO] : First-Order Method : only for Normal outcome and Random is used
  3. [HARDY] : Hardy Quadrature : only ONE Random

For GAUSS and ISAMP, NO ADAPTIVE QUADRATURE and NO ADAPTIVE SCALING can also be used (noad and noadscale)

PDF

can use y ~ negbin(m, p), or ll=logpdf('negbinomial', y, p,n) then model y~general(ll). but can not use pdf, log scale is needed.

TROUBLESHOOT

  1. sort by subject
  2. no misspell in PARAM
  3. no same name in data set
  4. reparamatrize : exp(2\*logsd_u)
  5. rescale
  6. GCONV = 0 makes FCONV; change to GCONV = 0.01 might help
  7. look at convergence : change covnrgence criteria
  8. try simpler model first
  9. PARMS for starting point
  10. try previous different convergence methods

SORT

PROC SORT DATA = INPUT OUT = OUTPUT [NODUPLICATES];
	BY [DESCENDING] VAR [VAR2] [_ALL_];
RUN;

UNIVARIATE

HISTOGRAM

proc univariate data=Steel;
	var Length Width;
	histogram;
	ods output moments  = a;
run;

MEANS

PROC MEANS DATA = KISD NWAY;
	CLASS GROUP;
	VAR age wt;
	OUTPUT OUT = outdata MEAN = aveage avgwt;
RUN;

by group;
output out = a mean=;

sort in advance

ods output summary = y;
...

proc transpose;
run;

GPLOT

symbol1 I=R V=circle interpol = join;

PROC GPLOT DATA=a;
	plot avgfall*time avginj*time;
RUN;

	plot (fall inj) * time = categories;

sysbol

v(value) = dot/circle
h(size) = 0.1/1/3

axis

axis1 order =(1 to 5 by 2, 10 to 100 by 10) label = (a=90 'y axis');

then

plot a*b /haxis = axis1 vaxis = axis2 legend =legend;

Double axis

plot y*x;
plot2 y2*x;

overlay

plot ... /overlay;

legend

title 'sss';
legend1 value = ("va1" "va2");

plot ... /overlay legend = legend1;

FREQ

PROC FREQ DATA=BABA NLEVELS; /nlevels for check how many leveles/
	TABLES idunit;
RUN;

ods select nlevels;

Options for tables:

tables A /missprint; displays missing
tables A /missing; treat missing as a level

Use own format

proc format;
value bedscat
	1='<100'
	2='100-400'
	3='>=400';
value owner
	1='Public'
	2='For Profit'
	3='Not For Profit';
value teaching
	1='COTH'
	2='Residency, Not COTH'
	3='Nonteaching';
run;

proc freq data=firsthosp nlevels;
tables bedscat owner teaching;
format bedscat bedscat. owner owner. teaching teaching.;
run;

SURVEYSELECT

PROC SURVEYSELECT METHOD = SRS (URS) SAMPLESIZE = 10 REP = 1 SEED = 110 OUT = datout;
	id _all_ ;
RUN;

or use samprate = 0.4

NLMIXED vs GLIMMIX

Syntax

proc glimmix data=fake;
class treat ident;
model y/m = treat / ddfm=residual;
random ident;
parms (0.25) (1) / hold=2;
error=binomial, link=logit;
run;

proc nlmixed data=fake tech=trureg df=297;
bounds sigma2>0;
parms mu=1 t1=-2 t2=2 sigma2=0.25;
if treat=1 then eta=mu + t1 + e;
else if treat=2 then eta=mu + t2 + e;
else eta=mu + e;
prob=exp(eta)/(1+exp(eta));
model y ~ binomial(m, prob);
random e ~ normal(0, sigma2) subject=ident;
contrast 'treat' t1, t2;
predict prob out=results;
run;

bounds

bounds -1 < rho < 1;

Pros

  • Likelihood can be user-specified if distribution is non-standard (ZI-NB)
  • More exact than glimmix or nlinmix and runs faster than both of them. Some say slower.
  • GLIMMIX: pseudo-likelihood vs NLMIXED: true log likelihood MLE (approximate) by Adaptive Gauss-Hermite quadrature. This estimation can also be available in GLIMMIX (method = QUAD in GLIMMIX), which can do nested model comparison
  • GLIMMIX dependent variables have to be from an exponential distribution
  • For marginal log likelihood approximation: NLMIXED more accurate (integral approximation by Gaussian quadrature), where GLIMMIX by linearization, potentially biased estimates for both fixed effects and covariane parameters, especially for binary data.
  • GLIMMIX wald tye test and confidence intervals and nested models can not be compared with true likelihood ratio test
  • INITIAL VALUE: NLMIXED > GLIMMIX; For GLIMMIX, more sensitive to initial values

Cons

  • Random effects must come from a (multivariate) normal distribution
  • All random effects must share the same subject (i.e. cannot have multi-level mixed models)
  • Random effects cannot be nested or crossed
  • DF for Wald-tests or contrasts generally require manual intervention
  • Statistically, GLIMMIX can handle more random effects than NLMIXED.
  • GLIMMIX allow REML , better for number of higher level units is small. NLMIXED no REML

Detail

There are some problems which GLIMMIX can handle with much greater facility than NLMIXED (e.g., models with both random effects and repeated measurements). On the other hand, the NLMIXED procedure may handle some problems which the GLIMMIX procedure cannot handle very well. For instance, you might have a study which is subject to random effects and your response follows a zero-inflated negative binomial distribution. The GLIMMIX procedure would not be able to fit a zero-inflated negative binomial response, but the NLMIXED procedure could be employed.

In general, the NLMIXED procedure is considerably more computationally intensive than is the GLIMMIX procedure, even though the GLIMMIX procedure is doubly iterative. Seldom will the NLMIXED procedure require less computational resources than GLIMMIX for equivalent problems.

The GLIMMIX procedure also fits mixed models for nonnormal data with nonlinearity in the conditional mean function. In contrast to the NLMIXED procedure, PROC GLIMMIX assumes that the model contains a linear predictor that links covariates to the conditional mean of the response. The NLMIXED procedure is designed to handle general conditional mean functions, whether they contain a linear component or not. As mentioned earlier, the GLIMMIX procedure by default estimates parameters in generalized linear mixed models by pseudo-likelihood techniques, whereas PROC NLMIXED by default performs maximum likelihood estimation by adaptive Gauss-Hermite quadrature. This estimation method is also available with the GLIMMIX procedure (METHOD=QUAD in the PROC GLIMMIX statement).

Summary

  1. NLMIXED:
    • model is simple and limited to two levels
    • number of random effects is small
    • number of level-2 groups is relatively large
    • binary data require accurate covariance parameter
    • user-defined response distribution
    • nested models need to be compared using the likelihood ratio
  2. GLIMMIX:
    • more complex models
    • a large number of random effects
    • more than two levels
    • number of higher-level units is small

GLIMMIX TROUBLESOME

  • Use a different METHOD= option, such as LAPLACE, QUAD, RMPL in PROC GLIMMIX, if possible.
  • INITGLM option in the proc GLIMMIX statement
  • TECH = NRRIDG or TECH = NEWRAP , in NLOPTIONS statement: especially useful for Binary, Binomial, Negative Binomial, and Poisson distribution
  • increase the number of optimizations using the MAXOPT=options (50 ) GLIMMIX STATEMENT
  • increase the number of iterations using the MAXITER = option in NLOPTIONS
  • if convergen > 1e-3, use

      nloptions gconv = 0
    
  • initial starting values in the PARMS statement

Sample:

proc glimmix startglm inititer=10 method=laplace/rmpl/quad;
  class clinic A;
  model y/n = A / link=logit dist=binomial;
  random clinic;
  parms (0.4) / noiter;
  nloptions technique=newrap;
run;

Memory Issue

  1. use DDFM=RESIDUAL or DDFM = BW in model statement would exponentially reduce the memory, especially for many level random effect (BW)
  2. NOTEST in model statement
  3. random , TYPE=VC
  4. NOCLPRINT
  5. NO SOLUTION
  6. hold random covariance
  7. Try HPMIXED

Model

A|B = A B A*B

Last Obs

data temp;
	set mylib.tourrevenue;
	retain HoldRevenue;
	Revenue = LandCost * NumberOfBookings;
	output;
	HoldRevenue = Revenue;
run;

or Use lag function and dif

Data teresaserial;
set teresa;
by idunit yearmonth;
lag_inj = lag(injfallscount);
dif_inj = dif(injfallscount);
if first.idunit then lag_inj = .;
run;

proc expand data=unemp out=unemp_laglead method = none;
id date;
convert rate = rate_lag1   / transformout=(lag 1);
convert rate;
convert rate = rate_lead1  / transformout=(lead 1);
run;

Count

data new;
set temp;
by group;
cnt + 1;
if first.group then cnt = 1;
run;

proc sort;
by group descending cnt;
run;

data new2;
set new;
by group;
retain totcnt;
if first.group then totcnt = cnt;
output;
run;

Beta-Binomial

logit link

proc nlmixed data=cardtox method=GAUSS NOAD fd qpoints=30;
parms b0=-0.24  b_dose=1.06 b_time=-0.07  rho=0.11;
pi_i = exp(b0 + b_dose*dose + b_time*time)/
(1 + exp(b0 + b_dose*dose + b_time*time)) ;
alpha_i1 =  pi_i*(1-rho)/rho  ;
alpha_i2 =  (1- pi_i)*(1-rho)/rho ;
like1 = gamma(alpha_i1 + y)*gamma(alpha_i2+n-y)/
gamma(alpha_i1 + alpha_i2 + n ) ;
like2 = gamma(alpha_i1) * gamma(alpha_i2)/
gamma(alpha_i1 + alpha_i2) ;
llik = log(like1) - log(like2);
model z  ̃ general(llik);
run;

whether z or y does not matter, and no need to add (n choose y). can use lgamma.

ODS

cheatsheet: http://support.sas.com/rnd/base/ods/scratch/ods-tips.pdf

can create pdf and html

ods listing close;
ods pdf file="" startpage = no;
odf pdf close;
ods listing;

for html, change pdf to html

ods html style=banker path="C:\" gpath="images" body="mygraph.html";

check table name :

ods trace on;
proc ...
ods trace off;

then check log for table name

otherwise,

ods output name = yourdataset;

Mixture Model

  1. nlmixed
  2. FMM (finite mixture model)
  3. R library : flexmix
  4. IML

In nlmixed proc, specify the mixture likelihood, and get the estimates by maximizing the likelihood

pros:

  1. straight forward
  2. can do random effect model
  3. more controls for component likelihood (say same alpha in NB)

cons:

  1. highly sensitive to initial value
  2. weird behavior (p estimates are the exact initial values)

FMM:

can do ZIP, ZINB, whatever mixture model, can also specify criterion to find the best model

pros:

  1. easy to implement

cons:

  1. does not fit models for multinomial data or models with random effects
  2. User-defined distributions or link functions are not supported

initial value :

proc fmm;
   model y = x1 x2 / k=3;
   probmodel  / parms(2, 1);
run;

TECH=keyword

  • CONGRA performs a conjugate-gradient optimization.
  • DBLDOG performs a version of double-dogleg optimization.
  • NEWRAP performs a Newton-Raphson optimization combining a line-search algorithm with ridging.
  • NMSIMP performs a Nelder-Mead simplex optimization.
  • NONE performs no optimization.
  • NRRIDG performs a Newton-Raphson optimization with ridging.
  • QUANEW performs a dual quasi-Newton optimization.
  • TRUREG performs a trust-region optimization.

Dummy Variable

array studyq(17) studyq\_1 - studyq\_17;
do i = 1 to 17;
studyq(i) = (studyqtr = i);
end;

IML/MLE

http://www.psych.yorku.ca/lab/sas/iml.htm

Use IML to do MLE:

call an optimization routine:

nlpqn(rc, xr, "fun", x, option, con) grd = "grad";
nlpnra : newton-raphson

con = {lower bound, upper bound}
con = {. 0, . .};

opt = {1 (to find max), 4 (print a lot of output)};

a simple example:

proc iml;
start ll(param) global (x) ;
mu = param[1];
sigma2 = param[2]##2;
n = nrow(x);
c = x - mu;
f = -n/2*log(sigma2) - 1/2/sigma2*sum(c##2);
return (f);
finish;

use sim;
read all var {yi} into x;
close sim;

p = {1, 2};
opt = {1, 4};
call nlpnra(rc, result, "ll", p, opt);
quit;

Read data into iml

use sim ;						/* Input the simulated dataset	 */;
read all var {one u} into x ;	/* Read the design matrix into x   */;
read all var {yi}	into y ;	/* Read the outcome data into y	*/;
read all into mat[rownames=name];
real all into dat;
close sim;

all: all obs

read all into mat[rowname=name] where sex='M';

Output

output to a SAS dataset

xys = yhat || res || weight;
cname = {"_YHAT_" "_RESID_" "_WEIGHT_" };
create out from xys [ colname=cname ];
append from xys;

CREATE libref.dataset <FROM matrix <[r=vector1 c=vector2]>>;

Variable

p = 2;
vec = {1, 2};
mat = j(row, col, num); constant matrix
mat = {2 4, 3 1};
nrow(mat)
mat[i, ]
vec[i]

index = 1:5 ;
col = 4:6` ; 3*1 matrix
vars = 'XX1':'XX7'

I(6);
diag(vector);
diag(mat);
t(mat) or mat` ; transpose

combine data

x = x0 || x1 || x2;

new row

mat = mat//submat[+, 2];

Index

ind = {2 3}
mat[ind, ind];
mat[1, 1:3]

column sum:

b = a[+,]

column sum of squares:

b = a[##,]

row mean:

b = a[, :]

sum is quicker than a[+,+].

LOC

location:

loc(a[,2] ^= .)

Matrix

coffee[r = names c=days] ; # colnames and rownames

print coffee[r=names c=days] weektot[format=dollar7.2] ,
daytot[c=days f=dollar8.2] '  ' total[f=dollar7.2];

reshape ( stored in row-major order) :

a = shape(x, 4, 3);

x = a || b || c;

math

power: x##2

elementwise product #

matrix product *

Condition (IF)

if expression THEN statement1;
ELSE IF expression THEN statements2;
ELSE statements;

loop (DO)##

do i = 1 to p by 2;
end;

DO WHILE (expression);
end;

DO UNTIL (expression);
end;

function (Modules)

start myfun(param) global (x);
return (f);
finish myfun;

eval function:

run myfun(param);
call nlpnra(rc, result, "ll", p, opt);

a = myfun(param)

print

print 'text' outcome;

Random

RANUNI(SEED) ; uniform(0, 1)
RANNORM(SEED) ; standard normal
RANBIN(SEED, n, p) ; binomial n, p

Random number generator

In IML

n = 5; m = 4;
x = j(n,m);
Mu = 1:m;
Sigma = (1:m)/m;
call randgen(x, "Normal", Mu, Sigma);
print x;

NLMIXED TEST

CONTRAST 'mean' b1-b2=0, b2-b3=0;

TROUBLESOME

Unable to read sas report file

This is an error in enterprise 4.2.

Ref: http://blogs.sas.com/content/bi/2009/09/03/differences-in-sas-enterprise-guide-4-1-and-4-2/

Change options to show in HTML: Tools > Results > Results General

Aggregate

Means

proc means ;
class ;
var;
output out = dd sum=;
run;

Summary

proc summary data = SampleData;
var Y2010 Y2011 Y2012;
output out=ProcSumOut sum=;
run;

SQL

proc sql outobs=12;
create table name as
select City, Country, mean(AvgHigh, AvgLow)
      as MeanTemp
from sql.worldtemps
group by xx
where calculated MeanTemp gt 75
order by MeanTemp desc;
quit;

Count unique levels:

proc sql;
create table new as
select count(distinct(name)) as namecount,
        count(distinct(age)) as agecount,
        count(distinct(sex)) as sexcount
        from sashelp.class;
quit;

Summary of Data

AVG, MEAN
COUNT, FREQ, N
CSS : corrected sum of squares
CV: coefficient of variation
MAX
MIN
NMISS: number of missing
PRT
RANGE
STD
STDERR : se of the mean
SUM
SUMWGT
T: testing the hypo that mean is zero
USS: uncorrected sum of squares
VAR

Optimization

http://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-sasiml/

  1. IML for log likelihood
  2. constraints
  3. nonlinear optimization (nlpnra (newton-raphson), etc)

available subroutines

http://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#imlug_nonlinearoptexpls_sect001.htm

  • NLPCG: Conjugate Gradient Method
  • NLPDD: Double Dogleg Method
  • NLPNMS: Nelder-Mead Simplex Method
  • NLPNRA: Newton-Raphson Method
  • NLPNRR: Newton-Raphson Ridge Method
  • NLPQN: (Dual) Quasi-Newton Method
  • NLPQUA: Quadratic Optimization Method
  • NLPTR: Trust-Region Method

A least squares problem is a special form of minimization problem where the objective function is defined as a sum of squares of other (nonlinear) functions.

  • NLPLM: Levenberg-Marquardt Least Squares Method
  • NLPHQN: Hybrid Quasi-Newton Least Squares Methods

Usage

nlpnra(rc, result, "FUN", initial, opt, con)

return codes (RC)

  1. >0 : converge
  2. <0 : not converge

Termination criteria (TC)

local vs global

All the IML optimization algorithms converge toward local rather than global optima.

One way to find out whether the objective function has more than one local optimum is to run various optimizations with a pattern of different starting points. For a more mathematical definition of optimality, refer to the Kuhn-Tucker theorem in standard optimization literature. Using rather nonmathematical language,

OPT

  1. opt[1]:
    • 1: max
    • 0: min
    • m: least squares
  2. opt[2]: print options

Merge

data new;
	merge old1 (rename = (y = y1))
		old2 (rename = (y = y2));
	by id;
run;

If only want matched pair in old1:

merge old1 (in = x) old2;
if x = 1;

Macro

%MACRO myfun(a = , b = );
...
%MEND myfun;

%myfun(a = , b= );

Random number

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001466748.htm

x = RAND('NORMAL',<,[Theta],[lambda]>)

summation by group in IML

  proc iml;
    use pottery;
    read all var{site x fe} into a;
	print a;
/*In these statements you should not need to change anything other than the matrix name.*/
    unique_rows=uniqueby(a,1,1:nrow(a));
print unique_rows;
    do i=1 to nrow(unique_rows);
      /* Next line is for the last BY group */
      if i=nrow(unique_rows) then index=unique_rows[i]:nrow(a);
      else index=unique_rows[i]:unique_rows[i+1]-1;
      submat=a[index,];
/*Below is where you put your function. This particular example computes the sum for X and FE in each BY group.*/
	/*          sum=sum//submat[+,2:3];*/
sum = sum//submat[+, 2];
    end;
    print sum;
    quit;

Or:

proc iml;
a={3 9 3 1,
3 0 2 1,
2 8 3 2};
c=ncol(a);
group=1;
b=a[loc(a[,c]=group),1:(c-1)][+,1:3];
print b;
quit;

Two sample proportion

proc freq data=ownerdata;
weight Count;
table source * owner / chisq;
where source = 'aha' or source = 'uti';
run;

Two sample ttest

proc ttest data=babies plots(unpack)=summary;
class momsmoker;
var bwt;
run;

Transpose

proc transpose;
by group;
id **;
var **;


Published

12 January 2013

Tags