Regression-Adjusted Real-Time Quality Control

原文的万方链接

回归调整的实时质量控制

Department of Laboratory Medicine, Zhongshan Hospital, Fudan University; IT Center, Zhongshan Hospital, Fudan University; Roche Diagnostics, China, Shanghai.

BACKGROUND:

Patient-based real-time quality control (PBRTQC) has gained increasing attention in the field of clinical laboratory management in recent years. Despite the many upsides that PBRTQC brings to the laboratory management system, it has been questioned for its performance and practical applicability for some analytes. This study introduces an extended method, regression-adjusted real-time quality control (RARTQC), to improve the performance of real-time quality control protocols.

背景:基于患者的实时质量控制(PBRTQC)近年来在临床实验室管理领域受到越来越多的关注。尽管PBRTQC给实验室管理系统带来了许多好处,但它的性能和对一些分析物的实用性一直受到质疑。为了提高实时质量控制协议的性能,本研究引入了一种扩展方法-回归调整实时质量控制(RARTQC)。

METHODS:

In contrast to the PBRTQC, RARTQC has an additional regression adjustment step before using a common statistical process control algorithm, such as the moving average, to decide whether an analytical error exists. We used all patient test results of 4 analytes in 2019 from Zhongshan Hospital, Fudan University, to compare the performance of the 2 frameworks. Three types of analytical error were added in the study to compare the performance of PBRTQC and RARTQC protocols: constant, random, and proportional errors. The false alarm rate and error detection charts were used to assess the protocols.

方法:与PBRTQC相比,RARTQC在使用常用的统计过程控制算法(如移动平均)之前有一个额外的回归调整步骤,以确定分析误差是否存在。我们使用了复旦大学中山医院2019年4名分析者的所有患者检测结果,比较两种框架的性能。在研究中加入了三种类型的分析误差来比较PBRTQC和RARTQC协议的性能:恒定误差、随机误差和比例误差。采用虚警率和错误检测图对协议进行评估。

RESULTS:

The study showed that RARTQC outperformed PBRTQC. RARTQC, compared with the
PBRTQC, improved the trimmed average number of patients affected before detection (tANPed) at total allowable error by about 50% for both constant and proportional errors.

结果:研究表明,RARTQC优于PBRTQC。与PBRTQC相比,RARTQC在总允许误差下,将检测前受影响的患者平均数量(tANPed)提高了约50%,无论是常数误差还是比例误差。

CONCLUSIONS:

The regression step in the RARTQC framework removes autocorrelation in the test results, allows researchers to add additional variables, and improves data transformation. RARTQC is a powerful framework for real-time quality control research.

结论:RARTQC框架中的回归步骤去除了检验结果中的自相关性,允许研究人员添加额外的变量,并改进了数据转换。RARTQC是实时质量控制研究的一个强大框架。

Introduction

Patient-based real-time quality control (PBRTQC) has gained increasing attention in recent years. Many reports have addressed the advantage of implementing PBRTQC as a secondary tool for internal quality control (IQC) to minimize the risk of quality incidence. PBRTQC can monitor analytical processes continuously, is not affected by the choice of control, and incurs only development or software costs.

简介 基于患者的实时质量控制(PBRTQC)近年来受到越来越多的关注。许多报告都强调了将PBRTQC作为内部质量控制(IQC)的辅助工具以最小化质量事故风险的优势。PBRTQC可以持续监视分析过程,不受控制选择的影响,只产生开发或软件成本。

However, many issues regarding the performance of PBRTQC have not yet been satisfactorily addressed. Recent studies by our team and that of Bietenbeck (5, 6) have pointed out the ineffectiveness of PBRTQC protocols on analytes with high robust normalized spread or highly skewed and disperse distribution, such as alanine aminotransferase (ALT). We also reported a potential loss of performance of optimized PBRTQC if naturally occurring or nonclinically significant trends or seasonality exist for an analyte. Such autocorrelation could cause the false alarm rate (FAR) to be unpredictable and the curves in error detection chart to be asymmetric
for the testing set or future data (5). In addition, most efforts to develop PBRTQC protocols have focused on univariate statistical process control (SPC) algorithms. Ng et al. (7) first developed subpopulation protocols for hospitalized and ambulatory patients to add additional clinical information. Their study showed that using subpopulation protocols, such as hospitalized patient protocols in the morning, can improve their performance, most notably the average number of patients affected before detection (ANPed). However, using the subpopulational protocol is a cumbersome approach to add multivariate information to PBRTQC protocols. When we have more than 2 grouping variables, this approach becomes inefficient to implement in the real world.

然而,关于PBRTQC性能的许多问题还没有得到满意的解决。我们团队和Bietenbeck最近的研究指出,PBRTQC协议对具有高鲁棒性标准化扩散或高度倾斜和分散分布的分析物(如丙氨酸氨基转移酶(ALT)无效。我们还报告了如果分析物存在自然发生的或非临床显著趋势或季节性,优化的PBRTQC可能会失去性能。这种自相关可能导致虚警率(FAR)不可预测,错误检测图中的曲线对于测试集或未来数据是不对称的。此外,大多数开发PBRTQC协议的努力都集中在单变量统计过程控制(SPC)算法上。Ng等人首先为住院和门诊患者制定了亚人群方案,以增加额外的临床信息。他们的研究表明,使用亚群体协议,例如在早上的住院患者协议,可以提高它们的性能,最显著的是在检测前受影响的患者的平均数量(ANPed)。然而,在PBRTQC协议中添加多变量信息,使用子种群协议是一种繁琐的方法。当我们有两个以上的分组变量时,这种方法在现实世界中实现起来就会变得低效。

A major attempt to solve the problem of autocorrelation in SPC application was to use a time-series model to remove the autocorrelation effects and use the residual of the regression model to build control rules. The theory was applied to IQC in the clinical laboratory by Alwan and Bisseii in the 1980s (8). However, their approach was limited to IQC data, which are very different from patient data. An additional benefit of incorporating the regression model, which is not recognized in Alwan and Bisseii’s work, is that additional variables can be added to provide a more accurate adjustment of the test results, providing a convenient way to factor in the effects of variables such as gender and other clinical information. Inspired by Alwan and Bisseii’s approach, we propose a novel framework, regression-adjusted real-time quality control (RARTQC), to improve performance.

解决SPC应用中自相关问题的一个主要尝试是使用时间序列模型来消除自相关效应,并使用回归模型的残差来构建控制规则。Alwan和Bisseii在20世纪80年代将该理论应用于临床实验室的IQC(8),但他们的方法仅限于IQC数据,与患者数据存在很大差异。Alwan和Bisseii的工作没有认识到回归模型的另一个好处是,可以添加额外的变量,以提供对检测结果更准确的调整,提供了一种方便的方式,将性别和其他临床信息等变量的影响考虑在内。受到Alwan和Bisseii方法的启发,我们提出了一个新的框架,回归调整实时质量控制(RARTQC),以提高性能。

Materials and Methods

DATA We used all test results for the year of 2019 of serum sodium (N ¼ 79 587), chloride (N ¼ 79 588), ALT (N ¼ 328 883), and creatinine (N ¼ 418 494) from the Department of Laboratory Medicine at Zhongshan Hospital, Fudan University. The data were extracted with anonymization. The use of data and additional clinical information without informed consent was approved by the Medical Ethical Committee of Zhongshan Hospital (B2020-392). The data from the first 9 months were used as the training set, and data from the last 3 months were used for testing.

材料和方法 我们使用了复旦大学中山医院检验医学系2019年血清中硫酸肼(N¼79 587)、氯化物(N¼79 588)、谷丙转氨酶(N¼328 883)和肌酐(N¼418 494)的所有检测结果。数据是匿名提取的。在未经知情同意的情况下使用数据和附加的临床信息得到了中山医院医学伦理委员会(B2020-392)的批准。前9个月的数据作为训练集,后3个月的数据用于检验。

ARTIFICIAL ERROR

The use of artificial error to simulate out-of-control situations in PBRTQC research is a common methodology because real out-of-control records are rare. Three types of analytical error were added: constant error (CE), random error (RE), and proportional error (PE). The formulas for the errors are:

人为的错误 在PBRTQC研究中,使用人为误差来模拟失控情况是一种常用的方法,因为真实的失控记录很少。增加了三种分析误差:恒定误差(CE)、随机误差(RE)和比例误差(PE)。误差公式为:

The RE was added to individual test values, denoted by xi in the formula of RE. Random() is a function that randomly picks a number from the interval of [-1,1]. Thus, each data point has chance of addition of a random positive or a negative error. TEa is total allowable error.

RE被添加到各个测试值中,在RE的公式中用xi表示。Random()是从[-1,1]的区间中随机选取一个数字的函数。因此,每个数据点都有机会添加一个随机的正误差或负误差。TEa是总允许误差。

RARTQC FRAMEWORK

We define RARTQC as an extension of the original PBRTQC protocol. The RARTQC framework adds an additional step. It uses regression analysis to adjust the test results and use the residual of the regression model as the input for SPC algorithms such as the moving average (MA) and the exponentially weighted moving average (EWMA). An illustration of the framework is shown in Fig. 1. This is not to be confused with a broader PBRTQC concept that includes methodologies such as limit check or delta check (9), PBRTQC was used specifically for the methodology described in Bietenbeck’s and in our previous work for the rest of the paper (5, 6).

RARTQC框架 我们将RARTQC定义为原始PBRTQC协议的扩展。RARTQC框架增加了一个额外的步骤。它使用回归分析来调整测试结果,并使用回归模型的残差作为移动平均(MA)和指数加权移动平均(EWMA)等SPC算法的输入。该框架的说明如图1所示。不要将这与更广泛的PBRTQC概念混淆,后者包括诸如极限检验或增量检验等方法(9),PBRTQC专门用于Bietenbeck和我们之前的工作中描述的方法(5,6)。

WINSORIZATION AND BOX–COX TRANSFORMATION

Bietenbeck et al. reported that using winsorization and Box–Cox transformation can improve the performance of PBRTQC protocols (6). We included both steps at the beginning of our proposed framework, because winsorization and Box–Cox transformation are also common steps for regression analysis. The truncation limits for winsorizing were selected in the optimization process, and lambda parameters were chosen with maximum likelihood using the R package bestNormalized (10). After the Box–Cox transformation, z-score standardization was also applied to the resulting data. The standardized results were in uniformed z-scores. Therefore, with the same mean and standard deviation, we can easily assess the regression adjustment’s effect across different datasets in the next step.

WINSORIZATION和BOX-COX变换 Bietenbeck等人报道,使用winsorization和BOX-COX变换可以提高PBRTQC协议的性能(6)。我们在提出的框架中包含了这两个步骤,因为winsorization和Box-Cox变换也是回归分析的常用步骤。在优化过程中选择了winsorzing的截断限制,并使用R包bestNormalized(10)以最大似然选择lambda参数。Box-Cox变换后,对所得数据进行z-score标准化处理。标准化的结果是统一的z分数。因此,在相同的均值和标准差下,我们可以很容易地在下一步评估不同数据集的回归调整效果。

REGRESSION ADJUSTMENT

Time is a key variable in the quality control measurement series (4, 8). Ideally, the control process series is considered to be an independently identically distributed random process with mean value of u and RE of e:

在质量控制测量序列(4,8)中,时间是一个关键变量。理想情况下,控制过程序列被认为是一个独立同分布的随机过程,均值为u,RE为e:

However, the process is usually not random, as the series contain some form of autocorrelation. For Alwan and Bisseii’s approach, an autoregressive integrated MA (ARIMA) model was used to model the autocorrelation (8). Nevertheless, the time-series structure of patient sample data is more complex than that of IQC data. Using the ARIMA model was inefficient for modeling autocorrelation. Instead, we used a multiple regression model with a simplified autoregressive structure as a single regressor. The multiple regression model also enables the addition of more regressors to the model.

在质量控制测量序列(4,8)中,时间是一个关键变量。理想情况下,控制过程序列被认为是一个独立的同分布随机过程,其均值为u, RE为e:然而,过程通常不是随机的,因为序列包含某种形式的自相关。Alwan和Bisseii的方法使用了自回归积分MA (ARIMA)模型来建模自相关性(8)。然而,患者样本数据的时间序列结构比IQC数据的时间序列结构更复杂。使用ARIMA模型对自相关建模效率较低。相反,我们使用一个具有简化自回归结构的多元回归模型作为单一回归量。多重回归模型还允许向模型中添加更多回归器。

If the regression model is well fitted, the residual and error should be normally distributed around zero without autocorrelation. Then, the residual and error can be used as inputs for common SPC algorithms.

To simplify adjustment of autocorrelation, we calculated the average of the previous 2000 test values, or an MA of N ¼ 2000, as a baseline regressor. The number 2000 was used because it is close to the average daily test volume for the chosen analytes at our facility.

如果回归模型拟合良好,残差和误差应该在零附近呈正态分布,不存在自相关。然后,残差和误差可以作为常用SPC算法的输入。为了简化自相关性的调整,我们计算了前2000个测试值的平均值,或N¼2000的MA作为基线回归量。之所以使用2000这个数字,是因为它接近我们设施中所选分析物的平均日测试量。

In addition to the autocorrelation baseline, we included 5 more regressors or independent variables: age,sex, outpatient or inpatient, engineered department information (enDepart), and engineered diagnosis information (enDiag). In addition to age and sex, clinical information such as diagnosis and department can be used to predict the patients’ levels of a specific analyte.
Therefore, feature engineering is required to simplify the text information into numerical variables. We created a three-level score system for each feature based on the average value of the test results for tests in which the keyword appeared. For example, if the average test value in the training set for keyword X is in the top 25% of all keywords’ average test value, then the sample with X in the diagnosis will be labeled as “1” for enDiag.Similarly, samples with keywords in the middle 50% will be labeled as “0” and the keywords in the bottom 25% will be labeled as “–1”. Because the primary language used at our facility was Chinese, we had to use the Jieba R package (11) to tokenize the diagnosis information to obtain the best results. A detailed description of the feature engineering process is provided in the online Data Supplement.

除了自相关基线外,我们还纳入了另外5个回归变量或自变量:年龄、性别、门诊或住院患者、工程科室信息(enDepart)和工程诊断信息(enDiag)。除了年龄和性别,诊断和科室等临床信息可以用来预测患者的特定分析物水平。因此,需要通过特征工程将文本信息简化为数值变量。我们根据出现关键字的测试结果的平均值为每个特征创建了一个三级评分系统。例如,如果平均测试值在训练集中,如果X在所有关键字的平均检验值的前25%,则对于enDiag,诊断中包含X的样本将被标记为“1”。同样,中间50%的关键词样本将被标记为“0”,底部25%的关键词将被标记为“-1”。因为我们的机构使用的主要语言是中文,所以我们必须使用Jieba R包(11)对诊断信息进行标记化,以获得最佳结果。特性工程过程的详细描述在联机数据补充中提供。

SPC ALGORITHMS

The SPC algorithms we chose in this study were MA, EWMA, and moving standard deviation (MovSD). The MA was selected as the baseline algorithm. The formula for each algorithm is shown next.

程控算法 我们在本研究中选择的SPC算法是MA、EWMA和移动标准差(MovSD)。选择MA作为基线算法。每个算法的公式如下所示。

The details for each algorithm can be found in our previous report (5), in which the EWMA performed best among all algorithms for CE and PE detection, whereas MovSD was the best for detecting RE. For a given algorithm, there are usually 5 hyperparameters that must be determined. An optimization process is required to determine the truncation limits (TLs, including upper TL or UTL, and lower TL or LTL) and block size (N). EWMA requires a different smoothing parameter k than N, but it is directly related to N, k ¼ 2/ (N þ 1). After these hyperparameters were optimized, the control limits (CLs, including upper CL or UCL, and lower CL or LCL) could be calculated based on the data and a desired false alarm rate (DFAR) designated according to experts’ experience.

每个算法的细节可以在我们之前的报告(5)中找到,其中EWMA在所有算法中对CE和PE的检测性能最好,而MovSD在检测RE方面最好。对于给定的算法,通常有5个必须确定的超参数。需要一个优化的过程来确定截断限制(TLs,包括上TL或你的,和更低的TL或LTL)和块大小(N) EWMA需要不同的平滑参数k比N,但它是直接关系到N, k¼2 / (Nþ1)。这些hyperparameters优化后,控制限制(CLs,包括上层CL或伦敦,和更低的CL或拼箱)可以基于数据和计算所需的误警率(DFAR)指定根据专家的经验。

OPTIMIZATION AND EVALUATION

To obtain the best set of parameters for protocols on the training set and evaluate the algorithms’ performance on the testing set, simulation is required. A grid-search style hyperparameter search was used to determine the optimized hyperparameters. In this study, we simulated with N ¼ 5, 7, 10, 20, 40, 60, 80, 100, 120, 140, and 160; LTLs ¼ 0%, 1%, 2%, 3%, 5%, 10%, 15%, and 20%; and UTLs ¼ 100%, 99%, 98%, 97%, 95%, 90%, 85%, and 80%. For evaluation, error detection charts were used to compare the performance of the protocols on the testing set (12, 13). The detailed process of optimization and evaluation can be found in our previous paper (5).

优化和评价 为了获得训练集上协议的最佳参数集,并在测试集上评估算法的性能,需要进行仿真。采用网格搜索式超参数搜索来确定优化后的超参数。在本研究中,我们模拟了N¼5、7、10、20、40、60、80、100、120、140和160;LTLs¼0%,1%,2%,3%,5%,10%,15%和20%;和你的¼100%,99%,98%,97%,95%,90%,85%,80%。为了进行评估,我们使用错误检测图来比较测试集上协议的性能(12,13)。具体的优化和评价过程见前文(5)。

METRICS

First, the FAR, which is differentiated from the DFAR, is the metric used to measure a protocol’s frequency of giving a false alarm. The FAR for the protocols on the training data should be identical to the DFAR with rounding errors. The FAR for the protocols on the testing set can be significantly different from the DFAR. However, the testing FAR can be more effective at informing us of how the protocol will perform on future data. We selected 0.1% as the DFAR in this study to be consistent with previous studies (5, 6).

指标 首先,FAR与DFAR不同,它是用于度量协议发出虚警的频率的度量。训练数据协议的FAR应与具有舍入误差的DFAR相同。测试集上协议的FAR可以与DFAR有很大不同。然而,测试FAR可以更有效地告诉我们协议将如何在未来的数据上执行。我们在本研究中选择0.1%作为DFAR,以与以往的研究一致(5,6)。

The second metric used was trimmed average number of patients until error detection (tANPed), which is a modified version of ANPed, an alternative sensitivity measure of the protocols. The details and rationale for using tANPed can be found in our previous paper (5). The tANPed at the TEa was the criterion for choosing the best hyperparameters during optimization on the training set. The TEa for each analyte (Na: 2%, Cl: 5%, ALT: 16%, and creatinine: 12%) used in this study was based on the Chinese National Standards (14). The tANPed for different levels of error was assessed on the testing set during an evaluation when constructing the error detection chart.

第二个指标是直到错误检测(tANPed)之前的平均患者人数,tANPed是ANPed的修改版本,ANPed是协议的另一种敏感性度量。使用tANPed的细节和原理可以在我们之前的论文(5)中找到。TEa上的tANPed是在训练集优化时选择最佳超参数的准则。本研究中使用的每种分析物(Na: 2%, Cl: 5%, ALT: 16%,肌酐:12%)的TEa依据中国国家标准(14)。在构建错误检测图的评估过程中,在测试集上评估不同误差水平的tANPed。

The calculation of tANPed was based on the number of patients until error detection (NPed) was recorded on the 1000 virtual days by bootstrapping. The details of the virtual days can be found in our previous paper (5). We first selected a random point in the dataset for each virtual day and retrieved the subsequent 1000 þ N 1 samples to form a virtual day data block. Next, the errors were added starting from the Nth point in the data block. The protocols were then applied to the data block and calculated the NPed. The process was repeated 1000 times to calculate the tANPed.

tANPed的计算基于患者数量,直到通过自举在1000虚拟天记录错误检测(NPed)。虚拟日的细节可以在我们之前的论文(5)中找到。我们首先在数据集中为每个虚拟日选择一个随机点,并检索后续的1000 þ N, 1样本,以形成一个虚拟日数据块。接下来,从数据块的第n个点开始添加错误。然后将这些协议应用于数据块,并计算NPed。该过程重复1000次以计算tANPed。

COMPARATIVE STUDY

In this study, the most important goal was to compare the performance of optimized RARTQC protocols with that of the optimized PBRTQC protocols in different settings. We obtained optimized PBRTQC protocols with MA, EWMA, and MovSD on 4 analytes with 3 types of added analytical error, and compared their performances with the respective optimized RARTQC protocols on the testing set. Both FAR and error detection charts were used to assess whether adding the regression step helped to improve the original PBRTQC protocols.

比较研究 在本研究中,最重要的目标是比较优化后的RARTQC协议与优化后的PBRTQC协议在不同设置下的性能。在3种附加分析误差的4种分析物上得到了优化的PBRTQC协议,分别采用MA、EWMA和MovSD,并将其与相应优化的RARTQC协议在测试集上的性能进行了比较。使用FAR和错误检测图表来评估添加回归步骤是否有助于改进原始PBRTQC协议。

STATISTICAL ENVIRONMENT

All data preprocessing, analysis, and simulation were performed using R v.4.01 (15). The code and resampled data are included in the online Data Supplement.

统计环境 使用R v.4.01(15)对所有数据进行预处理、分析和模拟。代码和重采样数据包含在在线数据补充中。

Results

ESTIMATION OF THE AUTOCORRELATION STRUCTURE
Supplemental Fig. 1 shows the autocorrelation function plots for each analyte. The plots showed complex autocorrelation for all analytes. We then aggregated the data into averages for each day; Supplemental Fig. 2 shows the strong autocorrelation found among daily averages.

结果 自相关结构的估计补充图1显示了每种分析物的自相关函数图。所有分析物的图谱均表现出复杂的自相关性。然后我们将数据汇总为每天的平均值;补充图2显示了日平均值之间的强自相关性。

REGRESSION MODELS

The coefficients of the regression models for the RARTQC protocols are listed in Table 1. All the coefficients were significantly different from zero based on the Ward test. The residual standardized error and adjusted R-square are also included in Table 1.

回归模型 RARTQC协议回归模型的系数如表1所示。根据Ward检验,所有系数均显著不同于零。剩余的标准化误差和调整后的r方也包括在表1中。

EFFECT OF REGRESSION ON DATA DISTRIBUTION

Table 2 and Fig. 2 summarize the effects of regression in RARTQC, in comparison with those in PBRTQC, which uses only winsorization and Box–Cox transformation as the data-transformation methods. Figure 2 shows how regression can remove large trends and autocorrelation in the data. To further demonstrate this observation, we used a cross-validation process to calculate cross-validated Cohen’s d (CV Cohen’s d), combined with a paired t-test, to show how the different types of data transformation affect those trends. CV Cohen’s d was calculated by dividing the dataset into 11 consecutive-month pairs (January–February, February– March, etc.). The average of the 11 Cohen’s ds from the pairs was defined as CV Cohen’s d. The set of 11
Cohen’s ds can then be used to conduct the paired t-tests. Table 2 indicates that the regression model significantly reduces the CV Cohen’s d for all analytes, whereas winsorization and Box–Cox transformation cannot. The regression can also improve the normality of the distribution, as shown in Table 2 and online Supplemental Fig. 3, which can reduce the kurtosis of
the original and PBRTQC-transformed data while reducing the skewness.

回归对数据分布的影响 表2和图2总结了RARTQC与PBRTQC的回归效果,PBRTQC仅使用winsorization和Box-Cox transfor—ation作为数据转换方法。图2显示了回归如何去除数据中的大趋势和自相关性。为了进一步证明这一观测结果,我们使用交叉验证过程来计算交叉验证的科恩d (CV Cohen’s d),并结合配对t检验,以显示不同类型的数据转换如何影响这些趋势。CV Cohen’s d的计算方法是将数据集连续分为11个月对(1月- 2月,2月- 3月等)。11个Cohen’s ds的平均值被定义为CV Cohen’s d。11个Cohen’s ds的集合可以用来进行配对t检验。表2表明,回归模型显著降低了所有分析物的CV Cohen’s d,而winsorization和Box-Cox变换不能。回归还可以改善分布的正态性,如表2和在线补充图3所示,它可以降低原始数据和pbrtqc转换数据的峰度,同时降低偏度。

COMPARISON RESULTS

The complete optimization grid-search results are listed in online Supplemental Table 1. The parameters for the optimized protocols for each analyte and each type of analytical error are listed in online Supplemental Table 2. Supplemental Table 2 also shows that the FAR on the testing set was close to or below the 0.1% DFAR in all cases for both RARTQC and PBRTQC. The error detection curves are shown in Figs. 3 and 4. We can see that the RARTQC outperformed the PBRTQC for all protocols, detecting CE and PE for all analytes. All curves in the error detection chart for RARTQC are much more symmetric than those of the PBRTQC protocols for detecting CE and PE. For RE detection, the performances for all of the algorithms were poor. RARTQC protocols performed slightly better than PBRTQC for sodium, chloride, and ALT, but less well for creatinine. Online Supplemental Tables 3 and 4 provide a summarized numerical representation of the same results shown in Figs. 3 and 4.

比较结果 完整的优化网格搜索结果见在线补充表1。每种分析物和每种分析误差类型的优化协议参数列于在线补充表2。补充表2还显示,在RARTQC和PBRTQC的所有情况下,测试集上的FAR都接近或低于0.1%的DFAR。误差检测曲线如图3和图4所示。我们可以看到,RARTQC优于PBRTQC的所有协议,检测所有分析物的CE和PE。RARTQC的错误检测图中的所有曲线都比PBRTQC协议检测CE和PE的曲线更加对称。对于正则检测,所有算法的性能都很差。RARTQC方案对钠、氯和ALT的效果略好于PBRTQC,但对肌酐的效果较差。在线补充表3和表4提供了图3和图4中相同结果的汇总数值表示。

Discussion

In this study, we demonstrated that our proposed
RARTQC protocol outperformed the PBRTQC protocol for detecting CE and PE. The regression step in the RARTQC framework helped to improve the PBRTQC in 3 aspects. First, the simplified autoregression structure regressor helped to remove the complex autocorrelation among the patient sample results. Second, the multiple regression model allowed us to add additional important variables to the protocols. Finally, using the residual of the regression model helped to normalize the original data better. With its effectiveness and simple structure, we consider RARTQC an improved framework for future real-time quality control research in the clinical laboratory.

讨论 在本研究中,我们证明了我们提出的RARTQC协议对CE和PE的检测效果优于PBRTQC协议。RARTQC框架中的回归步骤对PBRTQC的改进有三个方面的帮助。首先,简化的自回归结构回归器有助于消除患者样本结果之间复杂的自相关性。其次,多元回归模型允许我们向协议中添加额外的重要变量。最后,利用回归模型的残差有助于更好地对原始数据进行归一化。由于其有效性和简单的结构,我们认为RARTQC是未来临床实验室实时质量控制研究的改进框架。

Using the simplified autoregression term in the regression adjustment was the most crucial step to apply the idea proposed by Alwan and Bisseii (8) to patient data. Dealing with autocorrelated control process data has been an important topic in the field of SPC. Numerous solutions have been proposed to tackle the problems caused by autocorrelation. Alwan and Bisseii’s paper is considered to be one of the most influential in solving the problem in the history of SPC research.
Although their method can be readily implemented on IQC data, patient data are much harder to model with a standard ARIMA model, as the true autoregression structure is on the day level, not the sample level, which can be clearly seen in the autocorrelation function plots in the Supplemental material. However, using the average of the previous 2000 values as the baseline variable in a simple regression model resolves this complexity. The baseline regressor removes daily average autocorrelation, seasonality, small shifts caused by reagent lot change, calibration, and other subtle changes in the analytical process that do not have a significant clinical impact. We can see from Fig. 2 that the RARTQC protocols result in a more concentrated distribution than the PBRTQC protocols. This means that the RARTQC protocols’ CLs can be much narrower than those of PBRTQC with the same DFAR.

在回归调整中使用简化自回归项是将Alwan和Bisseii(8)提出的思想应用于患者数据的最关键步骤。自相关控制过程数据的处理一直是SPC领域的一个重要课题。人们提出了许多解决自相关问题的方法。Alwan和Bisseii的论文被认为是解决SPC研究历史上这一问题最具影响力的论文之一。尽管他们的方法可以很容易地在IQC数据上实现,但患者数据很难用标准ARIMA模型建模,因为真正的自回归结构是在天数水平上,而不是样本水平上,这可以在补充材料中的自相关函数图中清楚地看到。然而,在一个简单的回归模型中,使用前2000年的平均值作为基线变量可以解决这种复杂性。基线回归量剔除了日平均自相关性、季节性、试剂批次变化、校准和分析过程中对临床没有显著影响的其他细微变化所引起的微小变化。从图2中我们可以看到,RARTQC协议比PBRTQC协议产生了更集中的分布。这意味着RARTQC协议的CLs可以比具有相同DFAR的PBRTQC协议的CLs窄得多。

Additional benefits of using the autoregression baseline regressor included a more symmetric error detection chart and more predictable performance on the testing set, as shown in Figs. 3 and 4. We discussed in our previous paper how autocorrelation affect the protocol’s performance for future data (5). For example, if the future distribution of a certain analyte increased slightly due to population change, the previously determined CLs for PBRTQC protocols would become
more sensitive to the positive direction error and give more positive false alarms and become less sensitive to negative direction error and give fewer negative false alarms. The curves in the error detection chart would shift to the left. Both ANPed and FAR would be unpredictably affected. However, this would not be an issue with RARTQC.

使用自回归基线回归器的其他好处包括更对称的错误检测图表和测试集上更可预测的性能,如图3和图4所示。我们在前一篇论文中讨论了自相关如何影响协议对未来数据的性能(5)。例如,如果某个分析物的未来分布由于总体变化而略微增加,PBRTQC协议先前的抑制的(n)挖掘的CLs将对正方向错误变得更加敏感,并给出更多的正误报警,对负方向错误变得不那么敏感,并给出更少的负误报警。错误检测图表中的曲线将向左移动。ANPed和FAR都将受到不可预测的影响。然而,这对RARTQC来说不是问题。

The RARTQC described in this study focuses on detecting sudden clinically significant errors or changes. The RARTQC protocols were not suitable for detecting gradual cumulating errors over a long period of time. The baseline regressor considers such gradual changes as a part of autocorrelation rather than an error. To address this, one can increase the number of previous results needed to calculate the baseline to a month or a year of data to account for such changes simultaneously. Another related characteristic of the RARTQC protocols designed to detect sudden clinically significant errors is that the model will stop giving an alarm even if the error persists. This is also due to the baseline regressor, as if the error persists long enough, the model considers the error to be an autocorrelation. Additional RARTQC protocols can be used simultaneously with a protocol with a long baseline term to identify this issue. The exact QC plan for RARTQC should be evaluated further in real-world practice.

本研究中描述的RARTQC专注于检测突然的临床显著错误或变化。RARTQC协议不适合检测长时间渐进累积的误差。基线回归量认为这种逐渐的变化是自相关的一部分,而不是误差。为了解决这个问题,可以将计算基线所需的以前结果的数量增加到一个月或一年的数据,以同时考虑这些变化。设计用于检测突然的临床显著错误的RARTQC协议的另一个相关特征是,即使错误持续存在,该模型也将停止发出警报。这也是由于基线回归量,如果错误持续的时间足够长,模型就会认为错误是一个自动关联。附加的RARTQC协议可以与具有较长的基线术语的协议同时使用,以识别此问题。RARTQC的确切QC计划应该在实际实践中进一步评估。

In this study, RARTQC with EWMA was the best protocol for both CE and PE detection. The improvement from MA to EWMA was small, which is consistent with our previous findings (5). However, it is interesting to note that EWMA has been reported to be an effective way of estimating the prediction results of a simpler type of time-series model, the autoregression model (16). Therefore, the additional layer of autoregression functionality might explain the improvement of performance from MA to EWMA.

在本研究中,RARTQC + EWMA是检测CE和PE的最佳方案。从MA到EWMA的改进很小,这与我们之前的发现一致(5)。然而,有趣的是,据报道,EWMA是估计一种更简单的时间序列模型(自回归模型)的预测结果的有效方法(16)。因此,自回归功能的附加层可能解释了从MA到EWMA的性能改善。

As for RE detection, we have changed the method of adding RE from our previous study, which adds variance to the total data distribution. The new method adds a RE to each test result based on TEa. The results showed that both PBRTQC and RARTQC performed poorly on RE detection. Although the RARTQC showed a slight improvement over PBRTQC for 3 of the analytes, it was obscure to draw any significant conclusion from the results. To create a better regression model for RE, the generalized autoregressive conditional heteroscedastic model could be considered to model the change in the variance over time.

对于RE的检测,我们改变了之前研究中添加RE的方法,在总数据分布中增加了方差。新方法在TEa的基础上为每个测试结果添加一个RE。结果表明,PBRTQC和RARTQC在RE检测上表现不佳。尽管有3种分析物的RARTQC比PBRTQC有轻微的改善,但从结果中得出任何显著的结论是模糊的。为了建立更好的RE回归模型,可以考虑用广义自回归条件异方差模型来模拟方差随时间的变化。

The capability to add information to the patient sample is a key characteristic of the RARTQC framework. The RARTQC framework allows us to improve the performance as long as we can improve the regression model. Instead of including the 6 variables we used in this study, we could use more information on the patient, such as related analyte results, along with a release-from-the-back method to release all results only after all tests were completed (17). This idea was mentioned by Luo et al. in their study to predict test results using other test results (18). They proposed that a significant deviation between the predicted value and observed value could indicate certain preanalytical or analytical errors, but the idea never had a proper
framework to be implemented. Sampson et al. Also proposed a CUMSUM-logistic model to use similar information (19), but the model also had a similar implementation problem. RARTQC can take advantage of such information. Other valuable information, such as IQC results and proficiency testing results, can all be used as variables to fit the regression model, potentially integrating all components of the current quality control system. In addition, a more advanced
regression model can potentially be used to provide better performance.

向患者样本添加信息的能力是RARTQC框架工作的一个关键特征。RARTQC框架允许我们改进性能,只要我们能够改进回归模型。不包括我们在本研究中使用的6个变量,我们可以使用更多关于患者的信息,如相关分析结果,以及从后面发布的方法,只有在所有测试完成后才发布所有结果(17)。Luo等人在他们使用其他测试结果预测测试结果的研究中提到了这个想法(18)。他们提出,预测值和观测值之间的显著偏差可能表明某些预分析或分析错误,但这一想法从来没有一个适当的框架来实现。桑普森等。还提出了一个CUMSUM-logistic模型来使用类似的信息(19),但该模型也存在类似的实现问题。RARTQC可以利用这些信息。其他有价值的信息,如IQC结果和熟练程度测试结果,都可以作为变量来拟合回归模型,潜在地集成当前质量控制系统的所有组件。此外,可以使用更高级的回归模型来提供更好的性能。

In general, we believe that we have provided a powerful framework for real-time quality control research. We hope that an increasing number of researchers in the field will create improved models based on the RARTQC framework.

总的来说,我们相信我们已经为实时质量控制研究提供了一个强大的框架。我们希望该领域越来越多的研究人员将基于RARTQC框架创建改进的模型。