R is a programming language that is useful in data visualization. This week in addition to completing our second project we are also, finally, taking a look at R. Before the course started I was so excited about R that I had it downloaded and ready to go and I had already read a couple of chapters and downloaded some documentation in PDF to read as well. Although I wasn’t adventurous enough to try any actual coding at the time, I was looking forward to revisiting programming skills after abandoning it when I changed majors all those years before I really understood the ubiquitous future of computing. Then, anticlimax, R was set it aside as we explored spreadsheets, other data visualization tools and basic concepts in Statistics. Since it has been over twenty years since I have had any instruction in statistics, and I have yet to master any type of spreadsheet, this has been extremely useful for me!
If you are not already immersed in Information Science in general, you may miss why data visualization, data management and even learning to code are important to a Masters in Library and Information Science. For someone planning to work in a small public library, this may not be an essential skill set to develop. Developing useful programming for adults and children in your community and outreach may seem more immediately relevant but due to the ubiquitous nature of technology and the power of data to influence thought, it is more important than ever that any LIS professional at least be aware of these tools and the ability to use them and promote them to patrons could be extremely useful. From helping someone unemployed to build job skills for better opportunities to getting young people from underrepresented populations interested in STEM/STEAM careers having a little data science skill can go a long way. For someone interested in working in an academic or special library setting, these skills can be as essential as reference. If you are interested in courses in programming and R check out the following link:
But I digress, the assignment for this module is to take a given data set and produce a visualization in R. The recommendation is to use RStudio an integrated development environment (IDE) for R (GUI pictured above) as an easier way to develop data visualizations in R. You must have R already installed in order to use Rstudio and the download/installation process for both is extremely simple. Documentation for R is pretty much just text and is difficult to follow but RStudio has video tutorials and there are lots of other resources. I downloaded it, and watched a couple of tutorials then set it aside to work on other projects. Now I have to start over again, ugh!
Even though I like videos, today I decided to start with Computer World’s Beginner’s Guide to R: Introduction this would allow me to jump around and see if I can find what I need to do and the steps to get there. In Part 2, Getting Data Into R, I discovered I could create a text file of the module 9 data set and use it that way rather then enter it manually. I created a text file as a CSV file with the data set given in the module 9 assignment: 10, 20, 30, 40, 50, 60, 70, 80, 81 and called it <mod nine csv.txt>. I tried using the commands in the article but ended up using the <Environment> pane where I chose the <Import Dataset> drop-down and <Local file> and there was the data set named v1-v9.
Now to create a visualization! I tried the simple command <plot(mod.nine.csv)> and was given this uninspiring plot. Not sure what to do next I went back to the guide and discovered some simple functions to visualize the data.
I discovered I could display the data with the <head> and <tail> but the str() command did not produce the output the article described, I got a list of the values rather than a range. Statistics are discussed but are calculated on columns of data and each column of this set has only one value.
So moving on, how do I create an interesting visual with this data set? Returning to the module 9 presentation, I discovered that I can simply enter the data by assigning it (especially since this is such a small data set) and can make a pie with very little effort by simply copying the code used in the presentation. Making the pie visually interesting is a whole different matter though. I had trouble assigning the colors. I was more successful with the barplot function.
I haven’t quite learned to label the variables and would prefer to work through some of the tutorials with more complex data sets before spending too much more time but so far not too bad, once I decided to go back to the professor’s presentation and resources list rather than muddle through.
I still need to finish Project 2 so will leave you here for now. I hope to have a bit more accomplished with R next week.