BIOMARKERS FOR ALZHEIMER'S DISEASE
The present invention pertains to the domain of Alzheimer's disease (AD), and provides new genetic markers for this disease, distributed on 17 genes initially selected by transcriptomic analysis because of their over- or under-expression in the cerebral tissue of AD cases compared to normal individuals. Some of these markers can also be used for predicting an increased risk of developing an early-onset form of Alzheimer's disease.
AD is already a major problem of public health in our developed countries. About 5% of the people aged 65 or above are affected with AD and the prevalence rises steeply to 19% after age 75 and to 47% after age 85.
In the brain tissue, two anatomo-pathological damages characterize this pathology:
- an abnormal accumulation of extracellular deposits, mainly constituted by a peptide of 40 to 42 amino acids called amyloid peptide (or Aβ); - a neuronal intracellular aggregation of modified proteins form the cytoskeleton under paired helical filaments (PHF).
It is not very probable that a single causal factor is responsible for the development of these two brain damages. AD appears, indeed, as a multifactorial and complex pathology. The appearance and the evolution of this pathology, as those of most of the multifactorial diseases, are conditioned by the interaction of various genetic susceptibility factors and environmental risk factors.
At the clinical level, although difficult to determine, the age of appearance of the first symptoms allows to distinguish the relatively rare early-onset forms, affecting the patient before the age of 65 years, and the frequent late-onset forms arising after 65 years.
Most of the premature forms of AD are hereditary and their transmission follows a mendelian autosomic dominant mode. Although these cases represent less than 2 % of all the forms of AD, the discovery of genes involved in these familial forms allowed understanding better some pathophysiological mechanisms (Campion, Dumanchin et al. 1999).
To date, pathogenic mutations were discovered in the amyloid precursor gene (APP) (protein which metabolism produces the AB peptide), the presenilin 1 (PSl) and presenilin 2 (PS2) genes. These mutations were associated with a specific increase in the secretion of the AB 1-42 peptide, the most toxic form of all amyloid peptides (Cruts and Van Broeckhoven 1998).
These modifications in the APP metabolism led to propose a pathophysiological hypothesis based on the concept of the amyloid cascade.
Although the bibliographical data let think of the existence of 4 major genes, only the ε4 allele of the apolipoprotein E gene (APOE) is recognized as a genetic determinant of AD. The expression of this allele is associated with an increase from 3 to 5 times of the risk of developing AD (Farrer, Cupples et al. 1997). Encountered difficulties for characterizing new susceptibility genes in
AD are partially related to methodological problems. Indeed, these approaches get organized essentially around two axes: genome-wild linkage or linkage disequilibrium studies (LD) on familial (early-onset) AD and case-control studies in late-onset AD based on a bibliographical gene selection. Linkage disequilibrium (LD) studies are very well adapted when looking for pathogenic mutations responsible for early-onset AD forms and presenting a monogenic segregation (Kamboh 2004). However, this approach seems less appropriate when there are several implied genetic factors, for which impact differs from one family to the other, especially because of the implication of environmental factors. Such LD studies are made by genomic screening. This technique consists in studying the segregation of genetic markers regularly spaced out along the genome through several generations. This screening allows defining loci of interest susceptible to contain a gene implicated in the pathology.
However, the current studies based on this methodology and using populations suffering from early-onset AD forms lead to define chromosomal regions of interest sometimes extending over more than 60 centimorgan (Roberts, MacLean et al. 1999). These regions contain hundreds of genes and the systematic search for mutations or for pathogenic polymorphisms is consequently a very heavy task.
The use of case-control studies can also seem very effective to bring to light interactions between studied factors (genetic or environmental) and/or restricted effects on the risk of developing the disease (Bertram, McQueen et al. 2007). However, this approach presents numerous problems as for the selection of the studied genes, for the quality and the size of studied populations and for the functionality of the incriminated polymorphisms. This latter point is indeed crucial because, further to the complete sequencing of the human genome, a considerable number of polymorphisms are now available in databases for this kind of studies.
Hence, despite the intrinsic qualities of each method, the obtained results could not facilitate the identification of new genetic markers of AD.
Ln this context, the inventors developed filters in order to decrease the number of genes and thus polymorphisms to be studied. Their original approach, described in more details in the experimental part which follows, is based on the hypothesis that new genetic markers of AD could be identified by analysing the
differential expression of genes located in chromosomal regions of interest (Bensemain, Hot et al. 2007).
By performing such an analysis on two sub-populations of individuals of Caucasian origin (a French cohort of 1370 individuals and an American cohort of 1700 individuals), the inventors identified single nucleotide polymorphisms (or Tag- SNPs), located in 17 different genes, which are associated with the risk of developing Alzheimer's disease. These genes are indicated in Table 1 below, and the Tag-SNPs are detailed in Table 2.
Number of Association
Gene Abbreviation Ref. Seq Chr. associated with age at
SNPs onset
Myosin X MYOlO NMJU2334 5 24 Integrin alpha 2
ITGA2 NM_002203 5 3 precursor Butyrophilin, subfamily
BTN3A2 NM_007047 6 2
3, member A2
Tenascin XB isoform 1 TNXB NM_019105 6 4
Advanced glycosylation
AGER NMJ)Ol 136 6 4 end product-specific
Major histocompatibility HLA-DRA NM_0191 1 1 6 5 complex, class II, DR
PHFl PHFl NM_002636 6 3
Cyclin D3 CCDN3 NM OO 1760 6 3
Interleukin 33 /
Nuclear factor from IL-33/NF-
NM_033439 9 1 high endothelial HEV venules
FLJ20375 or
FLJ20375 AK000382 9 9
KIAA 1797
Beta- 1,4-
B4GALT1 NMJ)01497 9 5 galactosyltransferase 1 Ciliary neurotrophic
CNTFR NMJ)01842 9 7 factor receptor
Carbonic anhydrase IX
CA9 NMJ)Ol 216 9 1 precursor
KIAA 1462 KIAA 1462 NM_020848 10 3
CaI cium/calmodulin- dependent protein CAMK2G NMJ)01222 10 7 kinase II gamma
Angiopoietin 4 ANGPTL4 NM 009641 20 6
Spermine oxidase SMOX NM 175840 20 2
Table 1 : list of genes having at least one Tag-SNP associated with the risk of developing Alzheimer's disease.
Table 2: list of the SNPs showing a significant association with the risk of developing AD based on statistical analyses with recessive, dominant or co-dominant models. P was adjusted on the age, gender kind and the genetic status of APOE gene. The allele indicated in bold is the more frequent one.
The inventors also identified 16 SNPs which are associated with an increased risk, for an individual, of developing an earlier-onset form of Alzheimer's disease. This means that people having, for at least one of these SNPs, an unfavourable form of said polymorphism, statistically develop Alheimer's disease at least one year earlier than other people developing the disease. These SNPs are located in 4 different enes, as disclosed in Table 3 below.
Table 3 : SNPs associated with the age at onset.
Accordingly, the invention pertains to the use of at least one single nucleotide polymorphism (SNP) selected in the group consisting of rs 153202, rs27430, rs2401987, rs26315, rs6881621, rsl445946, rsl3356962, rs250339, rsl0051929, rsl2520877, rs4702173, rsl7651023, rs6898592, rsl7707609, rsl7651 165, rs31505, rs 17614059, rs 17651266, rs40985, rs716555, rs253315, rs2560852, rs7710976, rsl53709, rs716436, rs4865756, rsl 1741738, rs9467731, rs3757142, rs3130285, rs3130286, rsl85819, rs3130287, rs3134943, rs8365, rs3134940, rsl 800684, rs3129876, rs3129881, rs9268658, rs8084, rs7192, rs31 16713, rs442745, rs3106196, rs32181 14, rs9529, rs3218086, rs7044343, rslO757145, rsl332322, rs49781 1 1, rs7860490, rs7033259, rs6475473, rs6475474, rslO811438, rsl0964779, rs3780491, rslO51 19O9, rs2774272, rslO758194, rsl328898, rs2274592, rs2381164, rslO972149, rslO814123, rsl0758268, rsl2551429, rs3763613, rs3829078, rs6481654, rs9988732, rs2488024, rsl 1000780, rsl0824037, rsl0458656, rs2633310, rs2675662, rs2459446, rs2688614, rs6055551 , rs3787566, rs4816050, rsl3040505, rs730169, rs574628, rsl741296, rs6076623, rsl 124740, rs6118095, rs4813852, rs4816051, rs61 18137, rs4702179, rs2562343, rsl52342, rs6476454, rs3763615, rsl063205, rs2478839, rs7071535, rs2014144 and rs9337951, as a genetic marker for determining the genetic predisposition of an individual to Alzheimer's disease. In a preferred embodiment of the
invention, a group of 2 to 5, 6 to 10, 11 to 20 or even more than 20 SNPs selected in the above list is used as a genetic predisposition marker.
The present invention also pertains to the use of at least one SNP selected in the group consisting of rs 1124740, rsόl 18095, rs4813852, rs4816051, rsόl 18137, rsl7651266, rs4702179, rs2562343, rsl52342, rs6476454, rs3763615, rsl063205, rs2478839, rs7071535, rs2014144 and rs9337951, as a genetic marker for determining the genetic predisposition of an individual to develop an early-onset form of Alzheimer's disease. Groups of 2, 3, 4, 5, 6 to 10 or even more SNPs selected in the above list can also be used to determine the predisposition of an individual to develop an early-onset form of Alzheimer's disease.
According to a particular embodiment, the present invention relates to a method for in vitro predicting an increased risk, for an individual, of developing Alzheimer's disease, comprising a step of performing a genotyping assay of at least one gene selected in the group consisting of myosin X, integrin alpha 2 precursor, NM 007047, tenascin XB isoform 1, advanced glycosylation end product-specific, major histocompatibility complex, class II, DR, NM 002636, cyclin D3, Interleukin-33, KIAA 1797, NM OO 1497, ciliary neurotrophic factor receptor, carbonic anhydrase IX precursor, KIAA 1462, calcium/calmodulin-dependent protein kinase II gamma, angiopoietin 4 and SMOX, in a biological sample from said individual. An example of suitable sample for the SNP genotyping is blood. The
SNP genotyping assay can be performed using any SNP analysis method known in the art. For example, hybridization methods can be used, with technologies such as macro- or micro-array, or with technologies based on probes which are specific for the changing base of the SNPs of interest (e.g., Taqman® probes, Molecular beacons, Scorpion probes etc.). Other methods, such as restriction fragment length polymorphism (RFLP) methods, can also be used, as well as chromatography (dHPLC), capillary electrophoresis sequencing, mass spectrometry, or specific PCR. Various methods, such as SNPplex™, Amplifluor SNPs Genotyping System and single base extension (SBE) followed by tag-array hybridization and allele-specific primer extension, are commercially available for performing SNP genotyping assays. hi particular, the above method can be performed by genotyping at least one SNP selected in the group consisting of rsl 53202, rs27430, rs2401987, rs26315, rs6881621, rsl445946, rsl3356962, rs250339, rsl0051929, rsl2520877, rs4702173, rsl7651023, rs6898592, rsl7707609, rsl7651165, rs31505, rsl7614059, rsl7651266, rs40985, rs716555, rs253315, rs2560852, rs7710976, rsl53709, rs716436, rs4865756, rsl 1741738, rs9467731, rs3757142, rs3130285, rs3130286, rsl85819, rs3130287, rs3134943, rs8365, rs3134940, rsl800684, rs3129876, rs3129881,
rs9268658, rs8084, rs7192, rs3116713, rs442745, rs3106196, rs3218114, rs9529, rs3218086, rs7044343, rslO757145, rsl332322, rs4978111, rs7860490, rs7033259, rs6475473, rs6475474, rslO811438, rsl0964779, rs3780491, rslO5119O9, rs2774272, rslO758194, rsl328898, rs2274592, rs2381164, rslO972149, rslO814123, rsl0758268, rsl2551429, rs3763613, rs3829078, rs6481654, rs9988732, rs2488024, rsl 1000780, rsl0824037, rsl0458656, rs2633310, rs2675662, rs2459446, rs2688614, rs6055551, rs3787566, rs4816050, rsl3040505, rs730169, rs574628, rsl741296 and rs6076623. Of course, the skilled artisan can chose to genotype a determined group of 2, 3, 4, 5, 6-10, 1 1-20 or more than 20 SNPs. When performing this method, the skilled artisan can also attribute a weight to each of the genotyped SNPs, in order to obtain a score integrating the results of all SNPs genotypes. Of course, the skilled artisan can also combine the information obtained by SNP genotyping as described above with other information such as, for example, information concerning the e4 allele in said individual.
The present invention also pertains to a method for in vitro predicting an increased risk, for an individual, of developing an earlier-onset form of Alzheimer's disease, comprising a step of performing a genotyping assay of at least one gene selected in the group consisting of myosin X, ciliary neurotrophic factor receptor, KIAA 1462 and angiopoietin 4. in particular, this method can be performed by genotyping one, 2, 3, 4, 6 to 10 or even more SNP(s) selected in the group consisting of rsl 124740, rs61 18095, rs4813852, rs4816051, rsόl 18137, rsl 7651266, rs4702179, rs2562343, rsl52342, rs6476454, rs3763615, rsl063205, rs2478839, rs7071535, rs2014144 and rs9337951.
By "earlier-onset form of Alzheimer's disease" is meant that an individual having a marker as mentioned above will statistically develop Alzheimer's disease at least one year earlier than a person not having this marker. hi a preferred embodiment, the above methods are performed with a sample from a Caucasian individual.
Another aspect of the present invention is a kit for performing a method as above described, comprising means for genotyping at least one SNP selected in the group consisting of rsl 53202, rs27430, rs2401987, rs26315, rs6881621, rsl445946, rsl3356962, rs250339, rsl0051929, rsl2520877, rs4702173, rsl7651023, rs6898592, rsl7707609, rsl7651165, rs31505, rsl7614059, rsl7651266, rs40985, rs716555, rs253315, rs2560852, rs7710976, rsl53709, rs716436, rs4865756, rsl 1741738, rs9467731, rs3757142, rs3130285, rs3130286, rsl85819, rs3130287, rs3134943, rs8365, rs3134940, rsl 800684, rs3129876, rs3129881, rs9268658, rs8084, rs7192, rs31 16713, rs442745, rs3106196, rs32181 14, rs9529, rs3218086, rs7044343, rslO757145, rsl332322, rs497811 1, rs7860490, rs7033259, rs6475473, rs6475474,
rslO811438, rsl0964779, rs3780491, rsl0511909, rs2774272, rsl0758194, rsl328898, rs2274592, rs2381 164, rslO972149, rslO814123, rsl0758268, rsl2551429, rs3763613, rs3829078, rs6481654, rs9988732, rs2488024, rsl 1000780, rsl0824037, rsl0458656, rs2633310, rs2675662, rs2459446, rs2688614, rs6055551, rs3787566, rs4816050, rsl3040505, rs730169, rs574628, rsl741296, rs6076623, rsl 124740, rsβl 18095, rs4813852, rs4816051, rs6118137, rs4702179, rs2562343, rsl52342, rs6476454, rs3763615, rsl063205, rs2478839, rs7071535, rs2014144 and rs9337951, and a notice of use mentioning the relevance of said SNP(s) in the prognosis of Alzheimer's disease.
In a preferred embodiment of said kit, the kit comprises means for genotyping at least two, and more preferably at least 5 SNPs selected in the above list.
Non-limitative examples of genotyping means which can be included in a kit according to the invention are oligonucleotides and restriction enzymes specific for said SNP(s). Oligonucleotide(s) included in the kit can be bound to a solid support, for example in a micro- or macro-array. Alternatively or complementarily, oligonucleotide(s) included in the kit can be labelled (for example TaqMan® probes, molecular beacons, Scorpion probes, etc.). Reagents for performing HPLC, RFLP, capillary electrophoresis, specific PCR (such as SNPplex™ or Amplifiuor SNP genotyping), mass spectrometry etc., can also be included in the kit.
Other characteristics of the invention will also become apparent in the course of the description which follows of the biological assays which have been performed in the framework of the invention and which provide it with the required experimental support, without limiting its scope.
Examples
Materials and methods Populations
Both analyzed sub-populations were obtained by drawing lots from two case-control studies, a French one (n=1370) and an American one (n=1700). The characteristics of these two sub-populations are described in table 4 below.
French case- American case-control control sub-population
AD controls AD cases controls n 307 238 200 200
Mean age1 74.1 ± 73 .1 ± 8.5 77.2 ± 6. 1 74 .3 ± 5 .8
Mean age at 68.6 ± - 72.0 ± -
% of men 35 43 42 38
Table 4
All samples were from individuals of Caucasian origin. The clinical diagnoses were made according to the criteria NINCDS / ADRDA and DSM-III-R.
Controls were recruited in the same geographical region as individuals affected by AD
(in the North of France or in the East-region of Pennsylvania) and did not present criteria of dementia at the time of the recruitment.
Ch ip- Gen otyping
Using HapMap database (http://www.hapmap.org), a set of 1156 polymorphisms was selected on 82 genes. These polymorphisms are considered as being Tag-SNPs, allowing to define the most complete genetic information (frequency of the rare allele > 10 % and r2 < 0.8).
The experiments were realized by a service provider, DNA vision company (Belgium), as described by the supplier (Affymetrix company) by using a GeneChip oven 640 hybridization station and a GeneChip 3000 7G 4C scanner.
Among all analyzed SNPs, only twenty one SNPs (1.8 %) could not be genotyped due to technological problems. All other SNPs were genotyped with a rate of success greater than 90 %. Finally, seventy SNPs (6 %) were not in Hardy- Weinberg equilibrium (p < 0.05), the majority of them weakly differing from this equilibrium (60 %, with a probability included between 0.01 and 0.05). Besides, an important proportion of these SNPs were located inside or near a potential CNV (copy Number Variation).
Statistical Analyses
The software SAS 8.02 was used (SAS institute, Cary, NC). The association of the 1 156 SNPs with the risk of developing AD was estimated by logistic regression and adjustments on the age, the sex, the presence or the absence of allele e4 and the recruitment centre. Three models were systematically tested: recessive, co- dominant and dominant.
The model 12+22 versus 1 1 corresponds to a dominant model (that is at least one copy of an allele is sufficient to induce a modification of the risk).
The model 22 versus 1 1 + 12 corresponds to a recessive model (that is two copies of one allele are necessary to induce a modification of the risk)
Co-predominant model assume that the more the number of copy is important, the more the risk increase. Results and conclusions
Genes Selection by transcriptomic studies Previous linkage-disequilibrium LD studies allowed determining chromosomal regions susceptible to contain one or several genetic markers of AD. These regions extend sometimes over more than 95 cM.
From these data was developed within the laboratory a biochip allowing the study of the expression of 2741 ORFs ("Open Reading Frame") contained in these chromosome regions of interest (Lambert, Testa et al. 2003). The level of expression of these ORFs was compared in cerebral tissues resulting from 12 individuals affected by AD and from 12 controls (selected on the quality of extracted total ARN).
107 genes, differentially expressed in AD brains compared to control one, were consequently identified (table 5) (Bensemain, Hot et al. 2007). τ Distance in ^ 1 . ,.„„ Differentially expressed
Locus Λ - Selected ORFs 'r κ cM ORFs
Chr.1 50 393 13
Chr.5 50 174 6
Chr.6 40 535 24
Chr.9 55 230 12
Chr.10 95 415 15
Chr.12 40 306 11
Chr.20 50 239 9
Chr.21 58 267 6
Chr. X 25 182 11 Table 5 : number of differentially expressed genes in the various chromosomal regions of interest.
Association of identified genes with the risk of developing AD
To accelerate the analysis of the genetic association of these genes with the risk of developing AD, a genotyping chip with an average flow (Affymetrix) was thus developed. From 82 genes, 1156 Tag-SNPs were selected on the Hapmap site, as mentioned above.
These Tag-SNPs were analyzed on two sub-populations, French
(n=545) and American (n=400), obtained by drawing lots from more important populations. Seventy Tag-SNPs were not in Hardy-Weinberg equilibrium in the control combined population (French and American), most of them weakly differing from this balance (60 % between 0.01 < p < 0.05).
Among remainders Tag-SNPs and according to the three statistical models that were used (recessive, co-dominant and dominant), 89 of these Tag-SNPs located in 17 different genes presented at least one polymorphism associated with the risk of developing AD in the combined population and this in at least one of the three used models (p < 0.05). Besides, 4 of these genes also present Tag-SNPs presenting the characteristic to modulate the age of appearance of the pathology, (cf table 6 below and table 2 above).
Chromosomal Differential Number Number
Association location*1 expression b of of
Gene Ref. Seq Chr. with age at studied associated onset SNPs SNPs
16,715,016-
MYOlO NM 012334 5 + 201% 103
16,989,385 24 4
52,320,913-
ITGA2 NM_002203 5 + 215% 26 52,426,365 3
26,473,377-
BTN3A2 NM_007047 6 - 45% 13 26,486,525 2
32,1 16,911-
TNXB NM_019105 6 + 100% 31 32,185,131 4
32,256,724-
AGER NM_001136 6 - 53% 7 32,260,001 4
32,515,625-
HLA-DRA NM 019111 6 - 57% 10 32,520,799 5
33,486,934-
PHFl NM_002636 6 - 47% 4 33,492,187 3
42,010,650-
CCDN3 NM OO 1760 6 + 157% 9 42,017,530 3
IL-33/NF- 6,231,678-
NM 033439 9 - 66% 4 1 HEV 6,247,982 20,939,639-
FLJ20375 AK000382 9 - 62% 55 20,985,947 9
33,100,642-
B4GALT1 NM OO 1497 9 - 43 % 9 33,157,231 5
34,541,431-
CNTFR NM OO 1842 9 +132 % 16 34,579,722 7 2
35,663,915-
CA9 NM 001216 9 -50% 1 1 35,671,152 1
30,343,390-
KIAA 1462 NM 020848 10 + 262% 18 30,376,806 3 5
75,242,265-
CAMK2G NM_001222 10 -50% 10 75,304,344 7
801,604-
ANGPTL4 NM 009641 20 - 65% 42 844,605 6 5
4,103,675-
SMOX NM_175840 20 - 42% 15 4,1 16,352 2
Table 6
The fact that SNPs different from those associated to the risk of developing AD, are linked to different age of AD onset result from the combination of numerous factors:
• The statistical tools that have been used: logistical decline (qualitative / qualitative) versus analysis of the variance (qualitative / quantitative)
• The statistical models that have been used (recessive, predominant or co-predominant)
• The frequency of studied SNPs.
• The variability in the measurement of the age of onset and its heterogeneity between the different individuals
• The linkage disequilibrium more or less important between different SNPs
Conclusion
This integrated approach, based on the study of genes i) located in the chromosomal regions of interest defined by genomic screening and ii) differentially expressed in AD brains compared to control brains, allowed the identification of 17 susceptibility genes for AD. The analysis of separated and/or combined genetic variants of the genes which have been identified as associated with the risk of developing AD according to the present study will allow developing tools for the diagnosis and prognosis of AD.
REFERENCES
Bensemain, F., D. Hot, et al. (2007). "Evidence for induction of the ornithine transcarbamylase expression in Alzheimer's disease." MoI Psychiatry. Bertram, L., M. B. McQueen, et al. (2007). "Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database." Nat Genet 39(1): 17-23.
Campion, D., C. Dumanchin, et al. (1999). "Early-onset autosomal dominant Alzheimer disease: prevalence, genetic heterogeneity, and mutation spectrum." Am J Hum Genet 65(3): 664-70.
Cruts, M. and C. Van Broeckhoven (1998). "Molecular genetics of Alzheimer's disease." Ann Med 30(6): 560-5.
Farrer, L. A., L. A. Cupples, et al. (1997). "Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium." Jama 278(16): 1349-56.
Kamboh, M. I. (2004). "Molecular genetics of late-onset Alzheimer's disease." Ann Hum Genet 68(Pt 4): 381-404.
Lambert, J. C, E. Testa, et al. (2003). "Relevance and limitations of public databases for microarray design: a critical approach to gene predictions." Pharmaco genomics J 3(4): 235-41.
Roberts, S. B., C. J. MacLean, et al. (1999). "Replication of linkage studies of complex traits: an examination of variation in location estimates." Am J Hum Genet 65(3): 876-84.