Loops

stringr


A colleague has produced a file with one DNA sequence on each line. Download the file and load it into R using read.csv(). The file has no header.

Your colleague wants to calculate the GC content of each DNA sequence (i.e., the percentage of bases that are either G or C) and knows just a little R. They sent you the following code which will calculate the GC content for a single sequence:

library(stringr)

sequence <- "attggc"
Gs <- str_count(sequence, "g")
Cs <- str_count(sequence, "c")
gc_content <- (Gs + Cs) / str_length(sequence) * 100 

This code uses the excellent stringr package for working with the sequence data. You’ll need to install this package before using it.

Convert the last three lines of this code into a function to calculate the GC content of a DNA sequence.

Use a for loop and your function to calculate the GC content of each sequence and print them out individually. The function should work on a single sequence at a time and the for loop should repeatedly call the function and print out the result.

You may have noticed that for Loop prints the results differently. read.csv() imports the data as a data.frame(), unlike the numeric vector in the previous exercise.

[click here for output]