Please enable JS

STATISTICS & INTEGER OPTIMIZATION

On statistics, integer optimization, and options for my next data science course.

STATISTICS & INTEGER OPTIMIZATION

AUGUST 24, 2016/BARRY COLONNA

After my previous journal, I spent a few days debating on whether I should take statistics or precalculus first. As you may have guessed from the title, I chose statistics, which I finally began on Saturday.

I’ve also completed my analytics classes. We concluded the course with integer optimization, which I’ll speak of briefly below.

The last thing I’ll discuss today is some potential online courses related to data science that I found, and the unlikely place I discovered them.

Statistics

img

I took statistics in college and really enjoyed it. However, I have been bitter about that specific class since taking it. I was never one of those kids who freaked out if they didn’t get an A in a class. Those kids used to annoy me. That is, until college. Then I became one of those kids! Getting a B was awful. An A- made me insane because a minus affected our GPA at my college.

Back to that statistics class. I received an A on all of my quizzes and tests. You might even say I was an exemplary student, if you so desire. Except for my attendance. I was a couple minutes late to class on no more than 5 occasions and I missed a couple lectures during the semester. My fault entirely, but I never missed an assignment.

My professor, in all her wisdom, docked points off your grade if you were ever tardy or absent from the class. Quite a few points, actually. In fact, she lowered my letter grade for the course from an A to a B solely due to my attendance.

I get that I had no excuse for my tardiness, but I have to throw down the bullshit flag here. You shouldn’t ever be late for a job or for grade school / high school classes, but this is college. I paid for the class. If someone doesn’t want to attend a class, that’s their prerogative. Attendance alone shouldn’t impact your grade. Obviously if you miss assignments or tests because of it, then of course it should matter. But I didn’t and I don’t feel that I deserve that B.

My GPA no longer has any impact whatsoever on my life, and mine was pretty good, but it still bothers me even today.

Anyway, I began statistics on Khan Academy several days ago. I’ve only completed the first lesson thus far, but it was an exceptionally long lesson.

When I say lesson, I’m referring to all of the lectures and quizzes within a section. For example, exponential & logarithmic functions in algebra II, or the unit circle definition of sine, cosine, and tangent in trigonometry. Each lesson has multiple lectures covering all the topics within that heading.

The first lesson of statistics is displaying and describing data. Fun stuff!

It actually is kind of fun. We only covered the basics of statistics so far: mean, median, mode, range, standard deviation, variance, interquartile range, different types of plots, etc.

img

It’s funny that I’ve spent two months running statistical analyses in The Analytics Edge and I didn’t know how to read a box plot. I had completely forgotten what it meant after all these years.

This refresher was nice before we delve into more advanced topics. I’m also happy with my decision to take statistics first. I think it will help me in my upcoming analytics and data science classes.

I will say, however, that one of the first quizzes in the class asked a lot of questions about topics we hadn’t covered up until that point. I am incapable of ignoring any of the quizzes, so I continued working on it until I got the required 5 questions in a row correct. I feel that this quiz should have been toward the end of the lesson, rather than the beginning.

Other than that, the class has been great and Sal Khan does a fantastic job teaching all of the math topics. I couldn’t be happier that Khan Academy exists and I can’t imagine taking math anywhere else.

Integer Optimization

img

We covered integer optimization during the last week of The Analytics Edge on edX. It’s similar to linear optimization, which we studied last week, except it only uses integers (whole numbers).

I’m still using OpenOffice, but it can be completed with Microsoft Excel or LibreOffice. I have Excel, but it was easier to follow along in the lectures with one of the other two programs.

I’m becoming more comfortable with optimization. I still need to work on it, but it’s making more sense to me. My mind puts up a wall when I try to use spreadsheets, which I’m slowly breaking down. I’m still not sure why I dislike them so much. They’re useful and not all that difficult to utilize.

In the lectures, we learned how integer optimization is used to quickly and efficiently schedule sport teams and hospital operation rooms, as well as increasing probability matches on eHarmony.

img

For operation rooms, you need to know how many rooms each department needs, how often they need them, the minimum and maximum number, etc. You write these constraints, along with other data and objective, into the spreadsheet. It then calculates how to divide up the available operation rooms each week.

It’s actually pretty cool and simplifies complex problems that used to take weeks or months by hand (in the event of sports team schedules).

Now that analytics is over, I’ll begin organizing all of my notes for the course so I can use all of the techniques I learned in the future. I only hope my future classes are as good as this one and the ones on Khan Academy.

Options for Future Study

I’ve been browsing through edX and Coursera for a new class to take to replace The Analytics Edge.

img

I realized during my search one of the reasons why I’m not as fond of Coursera and the reason why they don’t allow you to access quizzes and assignments without paying for the class. At least for the classes I tried, anyhow.

Coursera is a for-profit organization, while edX is nonprofit. This isn’t an issue in and of itself. There are many institutions and companies that are designed for profit, most in fact. In the educational world, however, it’s my opinion that this can cause a conflict of interest.

The educational entity is more interested in making money than they are with providing quality instruction. I read a study that shows on average (yay statistics), nonprofit institutions charge less for tuition and they spend five times more per student than for-profit schools. That’s huge!

Not having access to assignments or quizzes greatly diminishes the level of learning. edX doesn’t require you to pay unless you want a certificate. Otherwise, all of the content is freely available. That’s why I’m leery to return to Coursera, but I have found a few classes that seem promising.

It seems that every time I find a class that sounds interesting, I read horrible reviews. I don’t base my life on reviews, but if a staggering number of students rate a class poorly, it gives me cause to question my choice to take it.

Since that continued to occur, and there are so many options in the field of data science, I decided to google it to see if there are any forums or discussions on good classes.

This led me to Reddit of all places. I’m not really into Reddit and I rarely go there, but I was surprised how great of a resource it became. People gave their insight into many of the classes I was considering, and some I had no knowledge of.

Current Candidates:

Machine Learning Specialization from the University of Washington
Platform: Coursera

Website: Machine Learning Specialization

The specialization consists of 6 classes. According to the course guide, it takes approximately 8 months to complete the specialization. That’s just shy of my mathematics estimates. It is on Coursera, which I try to avoid, but many people have said that’s it’s an excellent course and it sounds quite in depth.

If I do complete this specialization and my math lessons in the allotted time, I’d be ready to move onto much more advanced classes that I cannot take without linear algebra or multivariable calculus, and I’d have some decent machine learning knowledge to boot.

It uses Python instead of R. Data scientists should be comfortable in any programming language, but I was hoping to learn R first. It’s supposed to have a steeper learning curve than Python, and I thought it would be better to use as my first language. That’s not a deal breaker, but a consideration.

It’s taught by Emily Fox and Carlos Guestrin, who are both Amazon Professors of Machine Learning.
Statistical Learning from Stanford
Platform: Stanford Online

Website: Statistical Learning

This course covers a lot of the same topics that I learned in The Analytics Edge, but it does seem to go more in depth and it covers some additional statistical techniques I have not yet learned. It uses R extensively, which is good.

The prerequisites for this course may prohibit me from taking it at this time, however. Namely, one should have knowledge of linear algebra and manipulating discriminates. I cannot do that yet. Although, the course description says it’s not a math heavy class and doesn’t focus on formulas.

We’ll see. There are a lot of great reviews for the course and the professors are leaders in their field.

It’s taught by Trevor Hastie and Rob Tibshirani, statistics professors at Stanford University.
Learning from Data from Caltech
Platform: edX

Website: Learning from Data

This is a basic introductory computer science course that covers the basic theory, algorithms, and applications of machine learning. It balances theory and practice, along with mathematical and heuristic aspects of the field. Caltech says this is not watered down from the university class, which I do appreciate.

It actually supposed to be a really difficult class with graduate level discussions of topics. I enjoy a challenge, but the prerequisites for the course are knowledge of matrices and calculus. I’ll probably wait on this one until I have a little more knowledge. I mostly wanted to note it here so I can reference it later.

It’s taught by Yaser S. Abu-Mostafa, professor at the electrical engineering and computer science college at Caltech.
Machine Learning Lectures from Nando de Freitas
Platform: YouTube

Website: Machine Learning

Nando de Freitas posted many of his machine learning lectures on YouTube while he worked as a professor at the University of British Columbia. He wanted to make sure everyone had the opportunity to learn. He is now a professor at Oxford, who is an authority in the field of machine learning.

YouTube seems to be an odd place to take a class, but Sal Khan began Khan Academy on YouTube. Even though you don’t have the opportunity to take quizzes or work on assignments, it can still be informative. I read great things about Nando, so I will definitely look at his videos. He has quite a few lectures on machine learning and deep learning that sound interesting.
Microsoft Azure Machine Learning from Microsoft Virtual Academy
Platform: Microsoft Virtual Academy

Website: Microsoft Azure Machine Learning

I am a fan of Microsoft in general, but I was not aware of their online courses. I had read that their beginner’s machine learning course was good, so I included it here. In my quick scan of some of the classes, it appears that most are for advanced programmers to get additional knowledge in their field.

However, they do have their share of beginning and intermediate courses too. This one covers data mining and predictive analytics. It’s a relatively short class, but it’s supposed to be good. It does say to download the trial for Azure Machine Learning for starting the course. The trial lasts one month, which is long enough for the class, but I hate downloading trials unless I’m truly going to get the most out of it. It sounds like an interesting program, and I’m someone familiar with Microsoft Azure in general, but I don’t want to use it once during the trial, then have to pay to use it if I need it again. I’ll keep this on the backburner for now, but I’ll definitely look at their other course offerings.

It’s taught by Seayoung Ree, Buck Woody, and Scott Klein, all senior technical employees at Microsoft.

Conclusion

I’m sorry for writing so much today. I couldn’t stop! Hopefully there was something of interest to you here. Comment below if you know of any other classes or if you’ve had experience with one of the ones above. Thank you for following me and I’ll see you next week!





JOURNAL

This journal will be about my journey to become a data scientist and better myself through education and fitness.

I hope that my words inspire you to follow your dreams and show you that it's never too late to make a change.

SCHEDULE

Data science posts every Wednesday.

Health posts every other Sunday.

Follow Barry