Please enable JS

WIKIPEDIA & VECTORS

On using clustering in Python to create recommendations from Wikipedia articles. Study of vectors in precalculus.

WIKIPEDIA & VECTORS

SEPTEMBER 21, 2016/BARRY COLONNA

I slowed down a bit on my coursework compared to last week. I’ve been working on some personal tasks and trying to optimize my website and ancient computer. As such, I only completed week 4 of my machine learning class this week.

Although it makes sense to do a week in a week, the intro class isn’t too terribly in depth. So it is possible to do more, at least until I get to more advanced subjects in the Machine Learning Specialization on Coursera.

I think I’m three weeks ahead of the schedule for the class (you go at your own pace), so I’m feeling good about my progress thus far. I’m also making my way through precalculus relatively quickly.

Machine Learning

img

We learned how to do clustering in Python this week in Machine Learning Foundations. You may recall that I studied clustering in my analytics class on edX, but I used R back then. Way back then. A whole six or seven weeks ago.

For the purposes of this lesson, we used premade functions that are part of GraphLab Create to sort and conduct analyses on the data. In future classes, we’ll be learning how to create algorithms. The important part of this class is understanding how the algorithms work and the ways they can be used.

For this lesson, we used a corpus (a collection of documents) of Wikipedia entries for famous people. Our goal was to utilize Python to create a recommender system. For example, if someone was searching for Obama, we used common words to see who else that person would be interested in reading about. Typing his name returns the most similar results based on the text in each document.

Once that algorithm is set up, it can be used for any person in the corpus. We used it for Elton John, Taylor Swift, David Beckham, Angelina Jolie, etc. I’m not sure how that would be implemented to an actual web application or site, but we haven’t learned how to do that yet. But we will!

img

I’m really digging Python with the Jupyter Notebook. It’s so much cleaner and more user friendly than the standard R console. You can create headings, edit previous lines, and add multiple lines of code on one line. I also like that you seem to be able to add commands at the end of lines of code instead of creating multiple lines to do one thing.

I have a lot to learn and I still know more R commands than Python ones. It’s much different, but I really like it and I’m looking forward to the rest of the classes in the machine learning specialization.

Vectors

I finished conic sections and vectors this week in precalculus on Khan Academy.

img

Conic section are the shapes made when you cut into a three-dimensional cylindrical object. Ellipses, parabolas, and hyperbolas are conic sections. We learned the formulas for each of these and other unique features.

The following lesson was on vectors and scalars. Scalars are things with only magnitude or size. A number or distance, for example. Vectors have magnitude or size and direction, such as a distance in a specific direction. Velocity is a vector because it involves speed and direction, while speed is only a scalar.

Okay, that’s the extent of my math lesson. I felt that I needed to explain a little meaning behind what I learned, otherwise people who aren’t familiar with these concepts would be lost and I don’t want that.

Last week, I mentioned that knowledge of vectors and matrices are required for my machine learning specialization. They’re also very important for linear algebra, way down the road. Obviously, I’m only studying a very basic version of vectors for precalculus, but the concept is actually pretty simple.

Much less complicated than conic sections, anyway. We learned how to break a vector down into components, magnitude, and angle. Most of these involve pretty basic algebra and trigonometry. I’m not sure why I imagined it would be a lot more difficult. I couldn’t remember anything about them from my high school or college days.

img

So that was a relief. I began matrices yesterday, and they’re not too bad either. It’s not difficult, but multiplying two matrices together requires a little more thought.

Advanced knowledge of vectors and matrices are pivotal to creating machine learning algorithms. I’m not sure how yet since I haven’t studied algorithms, but from what I read, they are important. Hopefully I’ll know enough before I begin my next machine learning class. My current knowledge is more than sufficient for the introductory class, but I have no idea how future classes will be.

We’ll see soon enough! I have two “weeks” left of this class and then I’ll move on to the next one. Despite my concerns, I’m incredibly excited about this specialization and everything I’ll be learning. Not to mention all of the math I have and will learn on Khan. It’s just so great that these classes exist.

Conclusion

I feel as though I go from writing the shortest journal entry to the longest, and back again. I can’t seem to remain consistent. In fairness, last week I had quite a lot to discuss, making it uncharacteristically long. I apologize for that.

I hope you are all living, or reaching for, your best lives. Life is too short not to strive for happiness. Thank you for joining me. I’ll see you next week!





JOURNAL

This journal will be about my journey to become a data scientist and better myself through education and fitness.

I hope that my words inspire you to follow your dreams and show you that it's never too late to make a change.

SCHEDULE

Data science posts every Wednesday.

Health posts every other Sunday.

Follow Barry