Transforming Growth Factor β (TGF-β) is a multifunctional cytokine that plays a meaningful role in several biological processes such as cell replication, differentiation, migration, apoptosis, healing, bone formation, angiogenesis, and immune system regulation [1, 2]. It belongs to a TGF-β superfamily that contains more than 30 types of cytokines, including activins, inhibins, bone morphogenetic protein (BMP), anti-Müllerian hormone (AMH), and growth differentiation factor (GDF), besides TGF-β .
This extracellular dimeric protein is mainly produced by regulatory T cells, platelets, macrophages, neutrophils, bone, soft tissues, renal tubular cells, and also malignant cells [4, 5].
With a versatile role in regulation, TGF-β fosters tissue growth and morphogenesis in the embryo but, in contrast, it activates cytostatic and cell death processes that maintain homeostasis in mature tissues . Therefore, deregulated TGF-β signaling has been implicated in multiple developmental disorders and in various human diseases, including cancer, fibrosis, autoimmune diseases, and transplant outcome, in which high levels of TGF-β1 have been described for many [7-9].
Among the three homologous isoforms present in mammals (TGF-β1, TGF-β2, TGF-β3), TGF-β1 is the most abundant and ubiquitously expressed isoform [2, 10]. The regulation of TGF-β1 production occurs at many levels, including transcription, translation, secretion, and activation in the extracellular environment . In this context, we emphasize the influence of genetic variation on TGF-β1 production, represented by single-nucleotide polymorphism (SNP), based on a large number of studies that have shown the significance of SNPs on TGF-β1 expression and disease development.
TGF-β1 is synthesized as a pre-pro-TGF-β1 monomer that consists of 390 amino acid residues, comprising an N-terminal signal peptide of 29 amino acids (c.+1 to c.+87, +1 from the translation start site based on DNA sequence), a latency-associated peptide (LAP) of 249 amino acids (c.+88 to c.+835), and a C-terminal sequence of 112 amino acids (c.+836 to c.+1172) corresponding to the mature TGF-β1 [12, 13].
The signal peptide is removed during translocation across the rough endoplasmic reticulum membrane, where dimerization of two monomers occurs with three disulfide bonds at two cysteine residues in LAP (positions 223 and 225) and one in mature TGF-β1 peptide (position 356), forming the pro-TGF-β1 homodimer [14, 15]. The pro-TGF-β1 is cleaved between 278 and 279 amino acid residues by the endoprotease furin convertase within the Golgi apparatus, resulting in LAP homodimer and mature TGF-β1 homodimer separation. The homodimers are kept attached by noncovalent bonds to form the small latent complex (SLC) [13, 15, 16].
Subsequently, the SLC and TGF-binding protein (LTBP) are covalently attached to form the large latent complex (LLC) [17, 18]. After secretion, LLC binds to the extracellular matrix and is maintained until its activation which depends on enzymes and other convertases, as well as on low pH levels and irradiation-induced reactive oxygen species (ROS) production in the local environment [19-22].
Gene structure and regulation of TGF-β1 transcription
The TGF-β1 gene is located in the 19q13.2 chromosomal region, and comprises 7 exons separated by 6 very large introns . The exon 1 encodes the 5’UTR (5’ Untranslated Region), the signal peptide, and a part of the Latency-Associated Peptide (LAP) of which the coding sequence continues until codon 249 in exon 5. From the remaining exon 5 to exon 7, the formed segment encodes the mature TGF-β1 [24, 25].
TGF-β1 transcription is regulated by approximately 3 kb of DNA sequence (c.-2665 to c.+423). Several regulatory elements have been described, among them are 5’UTR (c.-839 to c.-1pb) which contains the Promoter region 2 (c.-839 to c.-568), Promoter region 1 (c.-1292 to c.-1161), Negative regulatory region 1 (c.-1570 to c-1293), Enhancer region 1 (c.-1971 to c.-1571), Negative regulatory region 2 (c.-2201 to c.-1972), and Enhancer region 2 (c.-2665 to c.-2204), as shown in figure 1[26, 27].
Members of the Activator Protein (AP-1) transcription factor family, Stimulating protein 1 (Sp1), Signal Transducers and Activators of Transcription 3 (STAT-3), Nuclear Factor kappa B (NF-kB), Early Growth Response 1 (Egr-1), and other transcription factors, may orchestrate the levels of TGF-β1 through recognition of specific sequences in the regulatory region of TGF-βI gene [27-33].
AP-1 and Sp1 are regulators of transcriptional activation. AP-1 recognizes two binding sites in TGF-β1 Promoter region 1, and Sp1 recognizes five binding sites located between c.-1161 and c.-898 [34, 35]. Furthermore, it was reported that another transcription factor, WT1 protein, may repress TGF-β1 gene expression through recognition of a response element (c.-957 to c.-949). Curiously, in the same study, it was verified that Egr-1 may activate TGF-β1 gene expression through the same response region . Since transcription factors depend on specific binding site recognition to regulate gene expression, genetic polymorphisms in the regulatory region could modulate transcription factor binding, altering TGF-β1 expression.
TGF-β1 functional polymorphisms
Genetic polymorphisms are inherited variations in the DNA sequence that occur in more than 1% of a population. The term “polymorphism” refers to the presence of different genotypes/alleles of a particular gene, and can occur as single-nucleotide polymorphism (SNP), deletion/insertions, polymorphic repetitive elements, and microsatellite variations .
SNPs are the most common type of polymorphisms and are usually found in areas flanking protein-coding genes that are critical for microRNA binding and gene expression regulation, in coding sequences, introns, or intergenic regions [37-39]. Accordingly, SNPs may influence gene expression, messenger RNA (mRNA) stability, alternative splicing, microRNA target sequence, protein exportation to endoplasmic reticulum via signal peptides, or alter protein function when an amino acid is changed .
The TGF-β1 gene presents various polymorphisms that can be classified as functional, non-functional, or with undetermined function. Until now, 8 SNPs and one deletion/insertion polymorphism have been reported to be associated with a functional impact on TGF-β1 production (table 1).
c.-1638G > A SNP
The c.-1638G > A SNP (rs1800468), commonly identified as -800G > A, is located in the enhancer region 1. The affinity of the cAMP response element binding protein (CREB) family is reduced in the presence of allele A, associated with lower TGF-β1 levels [41-43].
c.-1347C > T SNP
The c.-1347C > T SNP (rs1800469), commonly identified as -509C > T, is located in the first negative regulatory region and is associated with differential TGF-β1 gene expression and plasma levels . Individuals who present TT genotype show increased gene expression of TGF-β1 in comparison to CC genotype individuals [45, 46]. Furthermore, c.-1347T allele carriers have almost double plasma levels in comparison to c.-1347C allele carriers, in a dose-response relationship . Corroborating with these findings, in vitro studies using TGF-β1 promoter-luciferase reporter plasmids demonstrated that the c.-1347T allele increases relative luciferase activity, compared to the c.-1347C allele [44, 45, 47, 48]. Taken together, this data supports an influence of c.-1347C > T SNP on TGF-β1 gene expression and hence plasma levels.
Changes in the nucleotide binding sequence of transcription factors may be responsible for differential TGF-β1 gene expression associated with c.-1347C > T SNP. As hypothesized by Shah et al. 2006, increased TGF-β1 levels might result from c.-1347T because of the loss of negative regulation. In their results it was shown that AP1 is only recruited when allele C is encoded at the c.-1347 position, and this recruitment causes the reduction in luciferase activity. In addition, the transcription factor hypoxia-inducible factor 1A (HIF1A) binds a site surrounding the c.-1347C position, and competes with AP1 when allele C is present.
Other effects of this SNP involve the transcription factor Yin-Yang 1 (YY1); the presence of c.-1347T allele was shown to increase YY1 binding and elevated relative luciferase activity in a dose–response fashion, using a YY1 vector cotransfection .
c.+29C > T AND c.+74G > C
The c.+29C > T SNP (rs1800470), also identified as +869C > T and Pro10Leu; and the c.+74G > C SNP (rs1800471), also identified as +915G > C and Arg25Pro, are located in the signal peptide sequence . Both signal peptides comprise three regions: a positively charged N-terminal region, a central hydrophobic core, and a polar C-terminal region . The c.+29C > T SNP is located in the hydrophobic core, and both the alleles encode apolar amino acids at amino acid position 10 (allele C encodes proline and allele T encodes leucine). Conversely, the c.+74G > C SNP at amino acid 25 corresponds to a change from a large polar amino acid (arginine encoded by guanine) to a small apolar one (proline encoded by cytosine), and is located close to the cleaved region of pro-TGF-β which gives rise to LAP and mature TGF-β .
In an in vitro study, the allele c.+29C was shown to cause an increase in TGF-β1 secretion compared with c.+29T . Moreover, it was found that the serum concentration was higher in individuals with c.+29CC genotype than those with the c.+29CT or c.+29TT genotype [46, 52-54], and the serum concentration of TGF-β1 was higher with the c.+74G allele in comparison with the c.+74C allele .
c.-387C > T
The c.-387C > T SNP (rs11466316) is located in the 5’ UTR in exon 1 and the sequence nearby is a partial Sp1/Sp3 consensus site. Based on EMSA (electrophoretic mobility shift assay), Sp1 and Sp3 were shown to bind to a c.-387C probe but not a c.-387T probe . To demonstrate this functional association, a luciferase reporter assay was performed, and showed that the c.-387T allele reduces in TGF-β1 promoter activity by 5 fold when compared with the c.-387C allele .
Additionally, using luciferase reporter assay constructs transfected into HT1080 human fibrosarcoma cells cotransfected with a Sp1 expression vector, it was shown that the c.-387T allele is less responsive to stimulation by Sp1 in comparison with the c.-387C allele. Furthermore, the gene expression of this SNP was investigated by using phenotype-specific dermal fibroblasts, and it was observed that the c.-387CC genotype presented a higher expression level than the c.-387CT or c.-1387TT genotype. According to the authors, these data suggest that Sp1, and maybe Sp3, likely differentially regulate TGF-β1 expression due to the presence of this SNP .
c.+791C > T
The c.+791C > T SNP (rs1800472), also identified as Thr263Ile and +788C > T, is located in exon 5. Using the luciferase reporter assay, it was observed that Ile263 (c.+791T) led to an increase in relative luciferase activity when compared with Thr263 (c.+791C). However, no difference was detected in the concentration of active or total TGF-β1 levels. Thys et al. hypothesized that the absence of a difference in total TGF-β1 concentration could be explained by an effect of the Ile263 variant on the activation of TGF-β1 and not on secretion. As explained by previous observations for monogenic TGF-β1 mutations, a 35-fold increase in TGF-β1 activity may correlate with only a 2-fold increase in the concentration of active TGF-β1 [24, 56].
To explain the effect of c.+791C > T SNP, Thys et al. proposed, based on the PROSITE database (database of protein domains, families, and functional sites), that amino acid Thr263 is a Casein kinase II phosphorylation site. Casein kinase II is a serine/threonine kinase with activity independent of cyclic nucleotides and calcium. Although no experimental evidence exists, it is possible that the loss of phosphorylation site affects TGF-β1 activation. Alternatively, the authors also believe that changes in this amino acid could lead to a more efficient cleavage of the mature peptide from the LAP, either through a direct effect, since this site is only 15 amino acids away from the cleavage site, or by an indirect effect through a conformational change in the LAP .
c.-1287G > A
The c.-1287G > A SNP (rs11466314) is located in the TGF-β1 first promoter region. Two undetermined nuclear protein complexes were described to bind to the promoter region with c.-1287A with higher affinity than to a region with c.-1287G. This was further associated with an increased relative luciferase activity observed with c.-1287A versus c.-1287G. In conclusion, these results indicate that this SNP significantly affects TGF-β1 transcription .
The c.-2389_-2391insAGG polymorphism (rs11466313), also identified as -1550DEL/AGG, is located in the second enhancer region [26, 57]. Based on silico analyses, performed by Healy et al., 2009, it was predicted that the deletion leads to a loss and gain of transcription factor binding sites. However, an experimental analysis showed a gain of protein complex in the presence of this deletion . Thus, the functional relevance of this polymorphism requires further investigation.
c.-2725G > A
The c.-2725G > A SNP (rs2317130) is located in an undetermined region upstream of the second enhancer region. For this SNP, functional evaluation demonstrated a difference in complex binding affinity. The c.-2725A allele demonstrated high binding affinity complex formation, in contrast to the c.-2725G allele that presented a weak affinity . The influence of this SNP on gene expression and transcription factor requirement was not performed, however, this polymorphism may nevertheless be functional based on the differential complex formation.
c.-1985C > G, c.-1154C > T, c.-827G > C, AND c.-14G > A
The c.-1985C > G SNP (rs3087453) present in the second negative region; c.-1154C > T SNP (rs35318502); c.-827G > C SNP (rs11466315) present in the second promoter region in 5’UTR region; and c.-14G > A SNP (rs9282871) present in 5’UTR region are SNPs with no effecton nuclear protein binding. Additionally, c.-1154C > T demonstrated no influence on gene expression in reporter assays, suggesting that c.-1154C > T probably has no impact on TGF-β1 expression .
As exposed by Shah et al., some of these SNPs might affect the TGF-β1 regulatory region. For instance, SNPs have been shown to influence reporter gene activity without demonstrating exclusive recruitment of transcription factor(s) by EMSA . Alternatively, SNPs in the 5’UTR might alter post-transcriptional events, such as mRNA processing or stability .
Clinical implication of TGF-β1 functional polymorphisms
The TGF-β1 functional polymorphisms have been associated with different types of diseases (table 2) and could be used as susceptibility biomarkers. However, the simple association between one polymorphism and a disease may not justify such polymorphisms as biomarkers, since different polymorphisms affect TGF-β1 production. A haplotype is a cluster of alleles present at a locus that are inherited together , and haplotype analysis may therefore be a more robust method to reveal the association between SNPs and disease susceptibility.
In conclusion, eight polymorphisms have been described to affect TGF-β1 production and have been associated with disease susceptibility. Some of these interfere at the transcriptional level affecting transcription factor binding, while others interfere at the protein production level. To elucidate the correlation between TGF-β1 polymorphisms and disease susceptibility, haplotype analysis is necessary to certify the exact influence of these polymorphisms that are inherited together more effectively. However,the functional significance of many TGF-β1 polymorphisms still remains unclear, and further studies are required to elucidate the effect of TGF-β1 polymorphism and haplotypes in disease development.
This study was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Fundação Araucária, Secretaria da Ciência, Tecnologia e Ensino Superior (SETI) and the Londrina State University Graduate Coordination (PROPPG-UEL).
Financial support: none. Conflict of interest: none.