Learning Objectives
Following this assignment students should be able to:
- use version control to keep track of changes to code
- collaborate with someone else via a remote repository
Reading
Exercises
-- Set-up Git --
This exercise and Version Control Basics assignment references the Data Management Review problem. It will not be necessary to complete the Data Management Review exercise for this assignment, though we encourage the review and self-evaluation of your problem solving wizardry.
You’re continuing your analyses of house-elves with Dr. Granger. Unfortunately you weren’t using version control and one day your cat jumped all over your keyboard and managed to replace your analysis code with:
asd;fljkzbvc;iobv;iojre,nmnmbveaq389320pr9c9cd ds8 a d8of8pp
before somehow hitting
Ctrl-s
and overwriting all of your hard word.Determined to not let this happen again you’ve committed to using
git
for version control.Install
git
for your operating system following the setup instructions. Then create a new project for this assignment in RStudio with the following steps:- File -> New Project -> New Directory -> Empty Project
- Choose where to put your project
- Select
Create a git repository
- If everything worked in the upper right corner of RStudio you should see a
Git
tab
-- First Commit --
This is a follow up to Set-up Git.
Create a new file for your analysis named
houseelf-analysis.R
and add a comment at the top describing what the analysis is intended to do.Commit this file to version control with a good commit message. Then check to see if you can see this commit in the history.
-- Importing Data --
This is a follow up to First Commit.
- Download a copy of the
main data file and
save it to the a
data
subdirectory in your project folder. - Commit this file to version control.
- Add some code to
houseelf-analysis.R
that imports the data into R. - Commit these changes to version control
- Download a copy of the
main data file and
save it to the a
-- Commit Multiple Files --
This is a follow up to Importing Data.
After talking with Dr. Granger you realize that
houseelf_earlength_dna_data.csv
is only the first of many files to come. To help keep track of the files you’ll need to number them, so rename the current filehouseelf_earlength_dna_data_1.csv
and change your R code to reflect this name change.Git will initially think you’ve deleted
houseelf_earlength_dna_data.csv
and created a new filehouseelf_earlength_dna_data_1.csv
. But once you click on both the old and new files to stage them, git will recognize what’s been done and indicate that it is renaming the files and indicate this with anR
.In a single commit, add renaming of the data file and the changes to the R file.
-- Adding a Remote --
This is a follow up to Commit Multiple Files.
Dr. Granger contacts you and lets you know that she’d like to be able to see what you’ve been doing and to share some more files with you. She’s been learning version control herself while on sabbatical and so she suggests that you use a shared
git
repository on the hosting site Github.- Create an account on Github.
- If you want to work in a public repository you can create one by clicking
on the+
button in the top right hand corner of the Github website. If you’d rather have a private repository for class, email your username to your professor and they will create a repository for you.
- If you want to work in a public repository you can create one by clicking
- Connect your local git repository to your remote repository on Github.
- Click on the with the word
More
next to it and selectShell
. - Go to the Github webpage for your repository and copy the two lines of
code under
push an existing repository from the command line
. - Paste them into the Shell.
- Press enter.
- Click on the with the word
- Go back to the Github webpage for your repository and you should see your files.
- Create an account on Github.
-- Pushing Changes --
This is a follow up to Adding a Remote.
Now that you’ve set up your remote repository for collaborating with Dr. Granger you’d better get to work since she can see everything you’re doing.
- Write a function to calculate the GC-content of a sequence, regardless of the
capitalization of that sequence. (Hint: using the function
str_to_lower
orstr_to_upper
in thestringr
package might be useful). This function should also be able to take a vector of sequences and return a vector of GC-contents (it probably does this without any extra work so give it a try). - Commit this change.
- Once you’ve committed the change click the
Push
button in the upper right corner of the window and then clickOK
whengit
is done pushing. - You should be able to see the changes you made on Github.
- Email your teacher to let them know that you’ve finished this exercise.
- Write a function to calculate the GC-content of a sequence, regardless of the
capitalization of that sequence. (Hint: using the function
-- Pulling and Pushing --
This is a follow up to Pushing Changes.
STOP: Wait until your teacher has told you they’ve updated your repository following the last exercise before doing this one.
While you were working on your vectorized GC-content function, Dr. Granger (who has suddenly developed some pretty impressive computational skills) has been writing a vectorized ear length categorizer. To get it you’ll need to
pull
the most recent changes from Github.-
On the
Git
tab click on thePull
button with the blue arrow. You should see some text that looks like:From github.com:ethanwhite/gryffindorforever 1e24ac8..815e600 master -> origin/master Updating 1e24ac8..815e600 Fast-forward testme.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 youareawesome.txt
- Click
OK
. -
You should see the new function in your repository.
get_size_class <- function(ear_length){ # Calculate the size class for one or more earth lengths ear_lengths <- ifelse(ear_length > 10, "large", "small") return(ear_lengths) }
- Write some new code that creates a data frame with information about the individual ID, the earth length class, and the gc-content for each individual.
- Save this data frame as a
csv
file usingwrite.csv()
- Commit the new code and the resulting
csv
file and push the results to Github.
-