• wonderlic tests
  • EXAM REVIEW
  • NCCCO Examination
  • Summary
  • Class notes
  • QUESTIONS & ANSWERS
  • NCLEX EXAM
  • Exam (elaborations)
  • Study guide
  • Latest nclex materials
  • HESI EXAMS
  • EXAMS AND CERTIFICATIONS
  • HESI ENTRANCE EXAM
  • ATI EXAM
  • NR AND NUR Exams
  • Gizmos
  • PORTAGE LEARNING
  • Ihuman Case Study
  • LETRS
  • NURS EXAM
  • NSG Exam
  • Testbanks
  • Vsim
  • Latest WGU
  • AQA PAPERS AND MARK SCHEME
  • DMV
  • WGU EXAM
  • exam bundles
  • Study Material
  • Study Notes
  • Test Prep

1.3 Pharmacogenomic testing.

Testbanks Dec 29, 2025 ★★★★★ (5.0/5)
Loading...

Loading document viewer...

Page 0 of 0

Document Text

Chapter 1 Exercise Solutions 1.1 An SNP is a variation in a single base at a specific location in the DNA strand where at least two different nucleotides appear in the human population. They can affect the transcription of genes, mRNA transcript stability, amino acid sequences of proteins, which can effect protein function, etc. A lot of them are nonfunctional and do not affect anything regardless of their presence in the genome. They are im- portant in GWAS studies because their presence can be linked to certain complex diseases.

1.2 Microarrays and Next Generation Sequencing.

1.3 Pharmacogenomic testing.

1.4 Three differences between Affymetrix spoted microarrays and Illumina BeadChip

microarrays:

•Density of arrays (Illumina has higher density).•Position of oligonucleotides on arrays (Affymetrix has known predefined positions of oligos on arrays, Illumina has random positions of oligos).•Location of oligonucleotides (Affymetrix has oligos attached to glass with pho- tolithography, Illumina has oligos attached to silica beads).

1.5 There are at least five things in common between massively parallel sequencing

approaches:

•Fast and simple library preparation.•Ligation of adapters.•DNA fragment amplification with PCR.•Sequencing occurs in repeating steps, where each nucleotide incorporated is de- termined.•Extensive bioinformatic approaches for data analysis.

1.6 (1) DNA isolation, (2) DNA fragmentation, (3) Ligation of adapters, (4) Bridge amplification, (5) Sequencing by synthesis, (6) Data analysis.

1.7 Example of two genome projects are: The Saudi Human Genome Program that

targets the sequencing of 100,000 human genomes, 1 and a private initiative in Iceland to sequence 2,636 full genomes along with genetic information from more than 100,000 others Icelanders.2 1

http://pulse.embs.org/november-2015/the-saudi-human-genome-program/

2

https://www.wired.com/2015/03/iceland-worlds-greatest-genetic-

laboratory/ Analyzing Network Data in Biology and Medicine, 1e Nataša Pržulj Solution Manual all Chapters 1 / 4

  • Chapter 1 Exercise Solutions
  • 1.8 This sample solution aims to present a short introduction to the command-lines used in the analyses. Please refer to the workshop tutorials designed by GATK in order to get a full understanding of the algorithms, platform requirements and data prerequisites.3 (a) Three software packages must be installed to run a basic variant calling pro-

cedure:

•Picard:http://broadinstitute.github.io/picard/ •Samtools:http://samtools.sourceforge.net •GATK:https://software.broadinstitute.org/gatk/ Bear in mind that GATK is supported only on Linux/Unix and MacOS X systems.(b) The variant calling algorithms take as input data a reference fasta sequence (e.g., human reference genome) and a sequencing bam file (i.e., reads data

from sequenced samples). The fasta file must be prepared to use as reference:

  • Read the reference fasta sequence,refsequence.fa, and create a dic-
  • tionary of contig namesrefse quence.dictby using the command

CreateSequenceDictionary frompicard.jar:

# java -jar picard.jar CreateSequenceDictionary R=refsequence.fa O=refsequence.dict ii. Read the reference fasta sequence,refsequence.fa, and prepare a fasta index filerefsequence.faiby using the commandfaidxfrom

samtools:

# samtools faidx refsequence.fa (c) Run the algorithms to call the variants in the sequencing data. The GATK com- mand takes, in its basic use, the algorithm’s name,UnifiedGenotyperor HaplotypeCaller, the sequencing data,reads.bam, and the reference fasta sequence,refsequence.fa(the dictionary and fasta index files are read as well from the same directory), to generate the .vcf file with the called variants,variants.vcf.# java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R refsequence.fa -I reads.bam -o variants.vcf # java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R refsequence.fa -I reads.bam -o variants.vcf Among other parameters, the calling confidence threshold can be specified to filter out low quality variants, as well as the search interval in the .bam file.3

https://software.broadinstitute.org/gatk/documentation/topic?name=

tutorials 2 / 4

Chapter 1 Exercise Solutions 3 1.9 Replace the allele frequency (ps=0.1) and the genotype relative risk (r=3)in

the table to obtain relative frequencies of genotypes:

Relative genotype frequencies Group AA Aa aa Cases 0.5625 0.3750 0.0625 Controls 0.8100 0.1800 0.0100 Calculate the absolute genotype frequencies in each group by multiplyingndand n cfor the relative frequencies.Absolute genotype frequencies Group AA Aa aa Cases 28 19 3 Controls 40 9 1 Generate individuals’ genotype and code their status as 1 (case) or 0 (control): Patient id.1...4041...495051...7879...9798...100 GenotypeAA...AAAa...AaaaAA...AAAa...Aaaa...aa Status 0...00...001...11...11...1

1.10 Construct the contingency table for a recessive model:

Group AA+Aa aa Cases 47 3 Controls 49 1

Perform the Chi-squared test with 1 d.f. :χ

2 =0.26, p-value=0.6098. The SNP is not associated with the disease.

Construct the contingency table for a dominant model:

Group AA aa+Aa Cases 28 22 Controls 40 10

The Chi-squared test with 1 d.f. shows that:χ

2 =5.56, p-value=0.0184.The SNP is associated with the disease under this model.The risk of disease under the recessive model increases only when the individual inherited two copies of the risky allele from his/her parents (aa). As this genotype is less frequent in the population, finding a significant association require larger samples of individuals, which is not the case of this exercise (n d=50cases). In contrast, the SNP is associated with the disease under the dominant model because we assumed that the individuals show the phenotype (disease) when they carry at least one risky allele (eitherAaoraa). The frequency of those genotypes is higher; thus, a significant association might be found in relatively small samples. 3 / 4

  • Chapter 1 Exercise Solutions
  • 1.11 (a) The data set is generated in the same way as showed for Exercise 1.9, but the allele frequencies are randomly drawn from an uniform distribution between 0.1 and 0.9.(b) The LRMs can be fitted for each SNP inRprogramming language with the functionglmfrom packagestats. The p-values are adjusted to control error type I by using the Bonferroni correction.(c)(d)(e) The exercise asks to evaluate the sensitivity with the original setting (dataset I); then, the sample size and relative risks are increased (datasets II and III, re- spectively). An example of solution is presented as follows (due to the ran- domness of allele frequencies the values reported change but the trend in the

results is the same):

Sensitivity= True Positives True Positives + False Negatives ,

where:

True Positives: Number of SNPs with p-value<αandr>1.

False Negatives: Number of SNPs with p-value>=αandr>1.

As the Bonferroni correction is highly conservative, the sensitivity does not improve after increasing the sample size but it does when both the sample size and relative risk grow. Thus, the power of LRMs to detect SNPs causative of disease depends on well defined signals of association from the data, which can only be achieved by means of large cohort of patients and SNPs with strong effect sizes.

Sensitivity in detecting 10 SNPs associated with the disease:

Dataset IIIIII True Positives00 8 False Negatives1010 2 Sensitivity 000.8 1.12 The dataset is partitioned into sets of training and testing to conduct a 5-fold cross validation. The SVMs can be fitted inRprogramming language with the function

ksvmfrom packagekernlab, and the accuracy is assessed as follows:

Accuracy= True Positives + True Negatives Number of patients ,

where:

True Positives: Number of patients with disease status=1and predicted label=1.

False Negatives: Number of patients with disease status=1and predicted label=0.

  • / 4

User Reviews

★★★★★ (5.0/5 based on 1 reviews)
Login to Review
S
Student
May 21, 2025
★★★★★

With its step-by-step guides, this document made learning easy. Definitely a impressive choice!

Download Document

Buy This Document

$1.00 One-time purchase
Buy Now
  • Full access to this document
  • Download anytime
  • No expiration

Document Information

Category: Testbanks
Added: Dec 29, 2025
Description:

Chapter 1 Exercise Solutions 1.1 An SNP is a variation in a single base at a specific location in the DNA strand where at least two different nucleotides appear in the human population. They can a...

Unlock Now
$ 1.00