Rotating Panels using the R package rjssf

by David De Antonio Liedo

Here, we show how to specify the complex structure of the measurement errors that may appear in surveys with a rotating panel design. I briefly describe the model for the errors proposed by ELLIOTT, D. (2017), and I will show you how to specify it with the R package rjssf, which exploits the Java libraries defining the state-space framework (SSF) of JDemetra+. A completely unrelated SSF application regarding a structural model to forecast inflation can be found here.

Understanding the correlation patterns accross waves

The autocorrelated wage specific errors are driven by a proportion of individuals that do not update their responses on the basis of new information, so in principle it is possible that first wave individuals’ responses persist until the last wave $W$. However, it is typically assumed that the error in the first response is correlated to the error in the second response, but uncorrelated to the error in the third response. Thus, the first row of the table below represents all the correlation patterns we want to model in the case of a panel defined by $ W=5 $ (number of waves) and $ nlags=3 $ (survey period):

Input Correlation Matrix with $ W=5 $ and $ nlags=3 $				$\varepsilon^{(2)}_{t}$	$\varepsilon^{(3)}_{t}$	$\varepsilon^{(4)}_{t}$	$\varepsilon^{(5)}_{t}$
$\varepsilon^{(1)}_{t-3}$	$\varepsilon^{(2)}_{t-3}$	$\varepsilon^{(3)}_{t-3}$	$\varepsilon^{(4)}_{t-3}$	$\phi^{2}_{1}$	$\phi^{3}_{1}$	$\phi^{4}_{1}$	$\phi^{5}_{1}$
	$\varepsilon^{(1)}_{t-3\times 2}$	$\varepsilon^{(2)}_{t-3\times 2}$	$\varepsilon^{(3)}_{t-3\times 2}$	0	$\phi^{3}_{2}$	$\phi^{4}_{2}$	$\phi^{5}_{2}$
		$\varepsilon^{(1)}_{t-3\times 3}$	$\varepsilon^{(2)}_{t-3\times 3}$	0	0	$\phi^{4}_{3}$	$\phi^{5}_{3}$
			$\varepsilon^{(1)}_{t-3\times 4}$	0	0	0	$\phi^{5}_{1}$

Estimating the error component structure

The errors are unobserved components. If we were able to observe them, we would remove them from the data. Here, we are going to estimate the correlation parameters above assuming we have a direct observation of the errors, i.e. we simulate the errors using $\phi^{2}_{1}=0.2$, $\phi^{3}_{1}=0.3$, $\phi^{4}_{1}=0.7$, $\phi^{5}_{1}=0.9$ (in this example, the fith time individuals respond, their error is highly correlated with the one in the previous survey period). Also, $k^{(i)}=1$. We will assume the error has the following structure:

\[e^{(i)}_{t} = b^{(i)}_{t} + \varepsilon^{(i)}_{t}\]

where the first term follows a random walk with innovation variances given by $\sigma^2_{2}=0.001$, $\sigma^2_{3}=0.01$, $\sigma^2_{4}=0.02$, $\sigma^2_{5}=0.03$

# Let's load an excel file with simulated data 
data<-read_excel("../Data/simulWaves.xlsx")
# Convert it into a matrix named "errors"
#(time in the rows and errors corresponding to 5 waves in the columns)
errors<-as.matrix(data[,1:5])

Bias component

The first term follows a random walk process and it represents bias:

$b^{(i)}_{t} = b^{(i)}_{t-1} + w^{(i)}_{t}$,

where $w^{(i)}_{t} \sim N\left(0, \sigma^2_{i} \right)$

Most of the times, this component cannot be identified together with additional trend components. Thus, one identification restriction is to assume that the bias terms for all weights is equal to zero:

\[b^{(1)}_{t} = -\sum_{i=2}^{W} b^{(i)}_{t}\]

Alternatively, one could assume that there is no bias in the first wave.

This is how we would specify the bias components:

model<-jd3_ssf_model()
# bias (we do not specify the first one since it is either zero, or the sum of the rest)
add(model, jd3_ssf_locallevel("b2"))
add(model, jd3_ssf_locallevel("b3"))
add(model, jd3_ssf_locallevel("b4"))
add(model, jd3_ssf_locallevel("b5"))

Autocorrelation component

The second term represents an autocorrelated wave specific survey error. Given that each wave $i$ for time $t$ comprises a group of individuals that has been surveyed for the $ith$ time, and the same group of individuals responded during the previous survey period (i.e. $t-nlags$) for the $i-1$th time, a correlation pattern may arise when the responses are not updated efficiently. Thus, the error in wave $i$ for time $t$ may be correlated with the error in wave $i-1$ for time $t-nlags$, for $i>1$ (i.e. the first response of the individuals may be biased, but it is not correlated with previous responses, simply because it is the very first time they are asked to respond):

$\varepsilon^{(i)}_{t} = \phi^{(i)}_{1} \varepsilon^{(i-1)}_{t-nlags} + \epsilon^{(i)}_{t}$,

where $\epsilon^{(i)}_{t} \sim N\left(0, (1-(\phi^{(i)}_{1})^{2})(k^{(i)})^{2} \right)$. Note that $\phi^{(i)}_{1}=0$ for $i=1$.

This component is specified using the function jd3_ssf_msae, which uses as an argument ar initial values for the first row of the table above (although one could input the whole matrix, which would be upper triangular). Note that lag=3 indicates that the survey period (e.g. one quarter) corresponds to three periods of the base frequency (e.g. monthly), and corrComponent is simply the name we give to this component $\varepsilon^{(i)}_{t}$

# correlation:
ar=0.5
mar<-matrix(ar, nrow = 1, ncol=4)
add(david, jd3_ssf_msae("corrComponent", nwaves=5, ar=mar, fixedar = FALSE, lag=3))

Measurement equations

Estimating this model requires the specification of the measurement equations, which link our simulated errors to the various unobserved components defined above. The first equation, i.e. eq1, represents the error associated to the first wave: $e^{(1)}_{t} = -\sum_{i=2}^{W} b^{(i)}_{t} + \varepsilon^{(1)}_{t}$. The subsequent equations are simply $e^{(i)}_{t} = b^{(i)}_{t} +\varepsilon^{(i)}_{t}$, where the terms $\varepsilon^{(i)}_{t}$ have the structure we have defined above

Note that the line add(eq1, "sae", coef=1,fixed=T,loading=jd3_ssf_loading(0)) simply adds the first element (i.e. $\varepsilon^{(1)}_{t}$) of the state equation of the autocorrelated component. Equivalently, add(eq2, "sae",coef=1,fixed=T, loading=jd3_ssf_loading(1)) adds the second element (i.e. $\varepsilon^{(1)}_{t}$. The key is to understand the state space form of this autocorrelated error component, and to realize that the function jd3_ssf_loading(k) takes the (k+1)th element of the state vector

# error of wave 1 
eq1<-jd3_ssf_equation("eq1")
add(eq1, "b2", -1)
add(eq1, "b3", -1)
add(eq1, "b4", -1)
add(eq1, "b5", -1)
add(eq1, "sae", coef=1,fixed=T,loading=jd3_ssf_loading(0)) 
add(model, eq1)
# error of wave 2
eq2<-jd3_ssf_equation("eq2")
add(eq2, "b2")
add(eq2, "sae",coef=1,fixed=T, loading=jd3_ssf_loading(1))
add(model, eq2)
# error of wave 3
eq3<-jd3_ssf_equation("eq3")
add(eq3, "b3")
add(eq3, "sae", coef=1,fixed=T,loading=jd3_ssf_loading(2))
add(model, eq3)
# error of wave 4
eq4<-jd3_ssf_equation("eq4")
add(eq4, "b4")
add(eq4, "sae", coef=1,fixed=T,loading=jd3_ssf_loading(3))
add(model, eq4)
# error of wave 5
eq5<-jd3_ssf_equation("eq5")
add(eq5, "b5")
add(eq5, "sae", coef=1,fixed=T,loading=jd3_ssf_loading(4))
add(model, eq5)

Estimate the model The estimation is performed in a matter of seconds:

# estimate
modelResults<-estimate(david, errors, marginal=T, concentrated=F)
# plot results
result(modelResults,"parameters") 
result(modelResults,"parameternames")

However, the autocorrelation coefficients turn out to be smaller than those used to generate the process.

[Bring here the coefficients used to simulate the data, and compare with the actual outcome in the form of a table]

Discussion

The method is efficient, but parameter estimates are inconsistent. [plot likelihood for simulated data in function of corr coefficients]

References

HARVEY, A. and C.H. Chung(2000), Estimating the underlying change in unemployment in the UK, Journal of the Royal Statistical Society: Series A, Volume 163, p. 303-339
van den BRAKEL, J. and S. Krieg (2009), Structural time series modelling of the monthly unemployment rate in a rotating panel] , Discussion Paper, 09031
ELLIOTT, D. (2017), Increasing frequency and improving timeliness of key variables in the UK Labour Force Survey, University of Southampton

Posted on 19 Feb 2019

rotating panels R states space models rjssf