Assistant Professor University of Pittsburgh Pittsburgh, Pennsylvania, United States
Introduction: Rheumatoid arthritis (RA) is a complex autoimmune disease with polyetiological genetic basis. Serum rheumatoid factor (RF) and anti-citrullinated peptide (CCP) antibodies are used to diagnose RA. However, it is unknown whether corresponding serological profiles map to distinct endotypes of RA. To address this, we first dissected differences across ~900 RA patients half of whom were serologically CCP+RF+ (i.e., double positive – DP), and half that were RF+ alone (RF). Surprisingly, there was a significant difference in heritability across these groups (~30%), suggesting fundamental differences in genetic risk of these two kinds of RA. Next, we carried out a genome wide association analysis (GWAS) and identified the HLA locus as explaining part of but not the entire difference in heritability between DP and RF RA. To delve into the missing heritability, we implemented a network-based GWAS approach. We adapt Linkage Disequilibrium Adjusted Kinships (LDAK) to aggregate the impact of multiple regulatory SNPs associated with a gene into a single score, taking into account the underlying LD structure. Using network propagation, we then identify modules that explain significant the differences in heritability across DP and RF. These modules include HLA genes, but also capture other cytokines, chemokines and immune regulators and almost completely capture the entire difference in heritability. We were also able to further validate these modules by recapitulating some of the corresponding differences at the transcriptomic and proteomic level. Together, our results suggest that DP and RF RA are different disease endotypes with distinct genetic bases and pathophysiology.
Materials and
Methods: To quantify differences in the genetic basis of CCP+ and CCP- RA, we first estimated heritability, a measure of the proportion of phenotypic variation attributable to genetic factors. We computed the heritability of each subgroup using individual genotypes corresponding to subjects in the RACER cohort. After excluding variants with a minor allele frequency below 0.05, approximately 7,200,000 SNPs were included in the analysis. We applied the REML LDAK approach. The cohort comprised an unequal distribution of subjects by sex consistent with higher disease burden of RA in females. So, we accounted for sex as a fixed effect in our analysis. In addition, we designated the first principal component from PCA as a fixed effect to control for any potential confounding factors arising from population structure. Next, we performed as a traditional GWAS and identified only the HLA locus as discriminatory between CCP+ and CCP- RA. This locus was unable to explain most of the differences in heritability. To address this, we used a novel network GWAS approach where we summarized genetic variation into summary gene scores using a linkage disequilibrium- informed approach LDAK-GBT. These scores were then propagated along a high-quality reference protein interactome network using random walk with restart to identify significantly (using permutation testing) high-scoring network modules that explained most of the differences in heritability. These were validated using additional criteria as described below.
Results, Conclusions, and Discussions: After accounting for sex and population stratification, the heritability difference between CCP+ and CCP- RA subjects was estimated to be 30%, highlighting significant differences in the genetic factors underlying these disease subtypes. However, a conventional GWAS within our cohort of ~900 patients only showed significant only identified the HLA loci as discriminatory between these disease subtypes, and the HLA locus explained only ~3% of the difference in heritability. To better explain the missing heritability, we used a novel network GWAS approach. By propagating summary gene scores on the human protein interactome using a random-walk with restart algorithm, we identified 14 modules that explain the significant differences across-group heritability. Unlike conventional GWAS, these modules almost completely capture the differences in heritability. We further prioritized the modules using orthogonal analyses. First with heritability partitioning, bulk RNA-seq data from sorted cell populations retrieved from CCP-positive and -negative RA patients and predicted expression data from our cohort. Only modules which were significant in two of the three validation analyses: heritability partitioning, bulk transcriptomic, and predicted expression analyses, were prioritized, resulting in three significant modules. The prioritized modules reveal that an interplay between HLA and immune regulator genes underlies the phenotypic differences observed between the two cohorts. The role of the HLA loci has been previously established with some evidence suggesting towards the involvement of the interferon pathways. These modules provide novel insights into pathways which underly the observed phenotypic differences. Overall, our results highlight the utility of using networks in conjunction with GWAS approaches to identify genetic loci for disease contexts, which might have small patient cohorts. While GWAS remain a cornerstone in genetic research, it might not always be feasible to increase sample sizes to drive gene discovery. Nevertheless, using network propagation it is possible to amplify the signals of genes that may not have reached statistical significance due to limited study power to identify additional disease-associated loci.