SEER Analysis Highlights Risk Factors for Advanced DLBCL at Diagnosis
Key Clinical Summary
- A logistic regression modeling of 126 774 SEER patients identified key predictors of advanced-stage diffuse large B-cell lymphoma (DLBCL) at diagnosis.
- Extranodal disease, age 65 years or older, and Hispanic ethnicity were among the strongest risk factors, with an area under the curve (AUC) of 0.795.
- Findings highlight demographic disparities and support risk-stratified screening strategies in US oncology care.
DLBCL remains the most common subtype of non-Hodgkin lymphoma, with outcomes strongly influenced by stage at diagnosis. A new analysis using SEER data (2000-2020) demonstrates that logistic regression modeling can effectively predict advanced-stage disease and identify high-risk populations. The study, presented at the 2026 National Comprehensive Cancer Network (NCCN) Annual Conference by investigators from multiple US institutions, emphasizes the role of demographic and anatomic factors in disease progression.
Study Findings
Using a SEER cohort of 126 774 patients, investigators applied a 70/30 training-validation split and performed multivariate logistic regression incorporating demographic and clinical variables. These included sex, race/ethnicity, marital status, primary tumor site, histology, and diagnostic method.
Six key predictors of advanced-stage (stage III/IV) DLBCL were identified, all statistically significant (P <.001). The strongest association was extranodal primary site involvement, with an odds ratio (OR) of 2.8 (95% CI, 2.4-3.2). Age ≥65 years also significantly increased risk (OR 1.6; 95% CI, 1.4-1.8), while Hispanic ethnicity was associated with higher likelihood of advanced-stage disease (OR 1.4; 95% CI, 1.2-1.6).
Conversely, marital status appeared protective, with an OR of 0.85 (95% CI, 0.77-0.94). Sex-based disparities were also observed (P = .01), indicating differences in stage at presentation between male and female patients.
The multivariate model achieved strong predictive performance, with an AUC of 0.795 (95% CI, 0.782-0.808), sensitivity of 0.74, and specificity of 0.78. In comparison, a demographics-only model yielded a lower AUC of 0.712, underscoring the importance of incorporating clinical variables such as tumor site.
Clinical Implications
These findings have direct implications for oncology clinical pathways and population health management. The identification of extranodal disease as the strongest predictor reinforces the need for heightened vigilance in patients presenting with non-nodal involvement.
Age and ethnicity-based disparities suggest that targeted outreach and earlier diagnostic interventions may be warranted in older adults and Hispanic populations. For oncology payers and pathway developers, integrating such predictive models could support risk-adjusted screening protocols and resource allocation.
Importantly, the improved performance of the full model compared with demographics alone highlights the value of comprehensive data integration. Incorporating both biologic and social determinants of health into predictive analytics may enhance early detection strategies and reduce late-stage presentations.
These results also support broader adoption of machine learning adjacent statistical methods, such as logistic regression, in real-world oncology datasets. Such tools can help refine clinical pathways by identifying patients most likely to benefit from early intervention.
The study authors conclude that “logistic regression delivers reliable forecasting of advanced DLBCL stage,” emphasizing its utility in identifying high-risk profiles for enhanced surveillance. They further highlight that inclusion of both demographic and anatomic variables improves predictive accuracy and reveals disparities in presentation.
Conclusion
This large SEER-based analysis demonstrates that logistic regression can effectively predict advanced-stage DLBCL and uncover clinically meaningful disparities. Integrating such models into oncology pathways may enable earlier diagnosis, targeted screening, and improved outcomes across diverse patient populations.
Reference
Sinha S, Song J, Cheema AY, Zhang R, Munir M. Predictors of advanced-stage DLBCL at diagnosis: logistic regression analysis highlights high risk populations with inclusion of demographic and anatomic risk factors. Presented at: 2026 NCCN Annual Conference; March 27-March 29, 2026; Orlando, Florida, and virtual.


