Please enable JS

ORGANIZATION & STATISTICS PROGRESS

On organizing coursework notes, statistics progress, and further contemplation about next steps.

ORGANIZATION & STATISTICS PROGRESS

AUGUST 31, 2016/BARRY COLONNA

This week’s journal entry will be a bit shorter than usual. I’ll let you decide if that’s a good thing! This week has been extremely hectic and a lot has come up that’s kept me from my studies.

I began organizing my R notes and I’m continuing my statistics studies. I haven’t yet decided on my next course, but at the end I’ll discuss is a little further.

Organization

I thought organizing my R notes from The Analytics Edge on edX would take no more than a few days. Boy was I wrong! My notes are a disjointed mess, not to mention that I have many pages to go through.

img

During the lectures, I wrote a copious number of notes to make sure I understood every command I typed. While this is great, I repeat myself quite often, sometimes my notes are unhelpful or make no sense, and sometimes I decided not to write notes on important topics, whether because I forgot or because the instructor didn’t explain that specific detail.

On top of that, all the side notes are mixed in with the R commands and arguments. In the R console, it’s pretty easy to differentiate, but not so much in a text file.

All of this to say it’s taking a lot longer than I originally expected. I’ve made a lot of progress and I’m glad I’m doing it, but the notes are a mess. Did I mention that already?

I’m also writing down every single command that I’ve learned, no matter how obvious or easy it seems to me now. It appears as though my next courses will not be utilizing R, so I want to make sure I have everything I need in case I forget when I return to it. Having it in a Word file will also make it easier to search for what I’m looking for, rather than needing to open a ton of text files to find one line of code.

I should be finished this week.

Statistics

img

As I mentioned, I haven’t been able to focus on my studies this week nearly as much as usual. I have completed several lessons in statistics on Khan Academy. I’m currently on probability. It’s one of the longest lessons in statistics, and I’m about one third of the way through it.

So far, we haven’t done anything too terribly advanced and I’m recalling more and more from my college class as we go. However, there were several lectures on linear regression proofs.

img

You know how much I love proofs (sarcasm) but deriving that formula is actually pretty intense and it amazes me how people originally came up with that and other formulas. The proof for linear regression involves a lot of algebra and some calculus. We never went into that aspect of the formula in my college class and I do think it’s important to at least know how it was derived.

Linear regression is used to predict a future event based on a set of data. For example, if you have median wage increases for each year for 20 years, you can use the past data to create a regression line on a plot. You can then calculate what you predict the wage will be at a certain point in the future. This obviously only works if there is a correlation between all of the points. It doesn’t do a whole lot of good for random data.

img

We also covered normal distribution. This is a function of random variables that creates a bell-shaped curve. For example, class grades usually take the form of a normal distribution, with the majority of the grades in the middle and a few outliers in the top and bottom of the class.

Khan created a pretty neat Excel program that can be used to experiment with the distribution curve to see how it’s affected by different parameters (pictured). It was used in the lectures and they have it free to download for anyone.

Next Steps

I still haven’t made my final decision on the next course I’ll take. I’m leaning toward the Machine Learning Specialization from The University of Washington on Coursera.

Course description from the website: “You’ll learn to analyze large and complex datasets, build applications that can make predictions from data, and create systems that adapt and improve over time.”

It sounds great and it would be a fantastic introduction to machine learning that I could expand on after completing my math courses. I’ve thought of some pros and cons to taking this specialization:

Pros: It has great reviews and recommendations from other students, it’s exceptionally in depth, it covers the information I want to learn, it’s an 8-month track which can be taken concurrently with my math track, and I meet the prerequisites.

Cons: It uses Python instead of R. Also, it’s on Coursera. I spoke about this last week, but I prefer edX or the like because it’s nonprofit and provides all the course material for free. Coursera is for-profit. They allow you to watch all of the lectures, but many of the assignments are entirely blocked unless you pay for the class. I’m worried that my learning may be hindered if I don’t have access to all of the assignments; I won’t be able to apply what I learn in the lectures. I have no problem paying for my education, when I can, but I feel that Massively Open Online Courses (MOOC) should be free.

Although my cons paragraph is much longer, I think the pros may win out in this particular case. I hope. There appears to be about 10 quizzes for each class in the specialization and a final capstone project at the end where you solve a real world problem through the implementation of machine learning algorithms.

It’s the project that I’d miss out on the most. I don’t want to forego an amazing set of classes solely due to the bitter taste I have for Coursera from my past negative experiences with it.

It is my primary choice thus far, but I still want to look into a few other options this week before I make my final decision.

Conclusion

Hopefully this week will be more calm than last week, but it hasn’t started off that way so we’ll see! I should at least know what my next class will be by the next time we meet, if I haven’t started it yet.

Wish me luck on deciphering my analytics notes and I’ll see you next week. Thank you for reading!





JOURNAL

This journal will be about my journey to become a data scientist and better myself through education and fitness.

I hope that my words inspire you to follow your dreams and show you that it's never too late to make a change.

SCHEDULE

Data science posts every Wednesday.

Health posts every other Sunday.

Follow Barry