November 21, 2016
Word2vec algorithm: Made as simple as possible, but no simpler
Recently, I have given a 3 hour workshop and 30 minute talk on word2vec. I would be delighted to talk about it again. LMK...
November 19, 2016
My life's mission
I'm on a mission to democratize technical data skills (i.e., math, statistics, programming, and data analysis). What today is bundled together as "Data Science".
Science and technology (in the broadest sense possible) are very effective in improving the quality of life for people. When smart people solve hard problems then share their solutions with the world, the world is a better and more interesting place. I hope to help that process by creating more highly skilled people who can both ask and answer the right questions.
I have been on this path for the last 13 years. I have done this as a Lecturer, Researcher, Tutor, Data Scientist, Mentor, Professor, etc. However, my mission transcends those labels of convenience.
Previously, I saw the traditional academic system as the greatest force multiple for myself. I could reach and help change people in a deep way through specific and personal connections. However, the incentives of that system are badly bent (possibly broken). I have turned to a nontraditional academic system, including community organizing and Open Source software. For now, that ecosystem works better to help people learn and grow in "Data Science". In future, there will be even better systems and I'll be there.
May 12, 2014
A couple of old model fitting tricks
I picked up many modeling tricks in graduate school working in a computational neuroscience laboratory. We doing machine learning, but we don't know at the time. We called it "automated model fitting." We used custom (and very finicky) algorithms in MATLAB. Now people are blessed with scikit-learn. These ideas might help out for edge cases:
- Have good "fake" data. "Fake" data allows for testing of the algorithms. There is an art to dummy data, it should be noisy but allow the models to converge.
- Linearly transform the data so all dimensions are within the same order of magnitude. Some models have trouble with weighting noise parameters on different scales.
May 5, 2014
My recent failures
I fail all the time. I recently failed at completing several MOOCs. I'm one of the unwashed masses that started but didn't finish.
Even through attempting, I learned something from each course. Given my personal value in life-long learning, improvement is my measure of success. Completion is easier to quantify and a more commonly accepted measure of success. A "factory" model of education values completion. A half completed commodity has little value. A factory stops work on a product and ships it. My professional life is continuous delivery. Any incremental improvement adds value to a continuous delivery system. I picked up new viewpoints on existing concepts that improved my understanding of the world through "failing" at MOOCs. I didn't get credit but I got value.
Even through attempting, I learned something from each course. Given my personal value in life-long learning, improvement is my measure of success. Completion is easier to quantify and a more commonly accepted measure of success. A "factory" model of education values completion. A half completed commodity has little value. A factory stops work on a product and ships it. My professional life is continuous delivery. Any incremental improvement adds value to a continuous delivery system. I picked up new viewpoints on existing concepts that improved my understanding of the world through "failing" at MOOCs. I didn't get credit but I got value.
April 28, 2014
The neurobiological limits of free will
Animals have limited, possibly no, choices. Biology and circumstances are their destiny.
I, being a human, have choices. They are limited but present. Like all humans, I have deep neural ruts (i.e., habits) that guide most of my thoughts and behaviors. They are not a permanently fixed boundaries; I can change them (slowly). I have the free will to choose to change those habits. It ain't easy (or quick) but possible.
Understanding the neurobiology can help. Knowing that simple repetition is more important than willpower. Knowing that old habits are easily cued, even after long periods of being dormant. Knowing that all brains have strong default systems, mostly around fear, that don't serve higher callings in the modern world. Knowing that there is never conscious access to unconscious processes, thus just trusting that providing "open space" will allow unconscious processes to manifest.
I'm humbled because I don't change even what is within my very limited ability. I "know" the rules but still don't play the game of change well.
I, being a human, have choices. They are limited but present. Like all humans, I have deep neural ruts (i.e., habits) that guide most of my thoughts and behaviors. They are not a permanently fixed boundaries; I can change them (slowly). I have the free will to choose to change those habits. It ain't easy (or quick) but possible.
Understanding the neurobiology can help. Knowing that simple repetition is more important than willpower. Knowing that old habits are easily cued, even after long periods of being dormant. Knowing that all brains have strong default systems, mostly around fear, that don't serve higher callings in the modern world. Knowing that there is never conscious access to unconscious processes, thus just trusting that providing "open space" will allow unconscious processes to manifest.
I'm humbled because I don't change even what is within my very limited ability. I "know" the rules but still don't play the game of change well.
April 21, 2014
Web 3.0 (possibly)
Web 1.0 connected ideas with webpages and hyperlinks.
Web 2.0 connects people with social media and mobile.
Web 3.0 could connect physical things with automation.
The Internet of Things (IoT), both the concept and the necessary features (e.g., infrastructure, tools, and protocols), are gaining momentum.
Each successive stage of the web relies on previous stages. The previous stage becomes the assembly language of the next stage. If static web content can be generated automatically, there is then engineering bandwidth to tackle dynamic web content.
The number of processes that can be automated continue to increase. We are now the cusp of the next generation of web automation - the cost of aggregating the information from the physical items, from consumer products to industrial processes, is dropping below the return-on-investment threshold.
The potential amount of data from this transition is staggering. The previous two versions of the web each redefined "big data." We are on the cusp of another inflection point. There isn't enough engineering bandwidth to make sense of this data with static / human-based systems.
The most promising direction is automated learning, aka things that get smarter the more you use them. Previous generations of technology were static. Excel or your browser doesn't get any better the more they are used. However, there is now software that learns as you use it and search algorithms can get smarter the more they are used.
Machine learning is the tool to leverage the promise of Web 3.0.
Web 2.0 connects people with social media and mobile.
Web 3.0 could connect physical things with automation.
The Internet of Things (IoT), both the concept and the necessary features (e.g., infrastructure, tools, and protocols), are gaining momentum.
Each successive stage of the web relies on previous stages. The previous stage becomes the assembly language of the next stage. If static web content can be generated automatically, there is then engineering bandwidth to tackle dynamic web content.
The number of processes that can be automated continue to increase. We are now the cusp of the next generation of web automation - the cost of aggregating the information from the physical items, from consumer products to industrial processes, is dropping below the return-on-investment threshold.
The potential amount of data from this transition is staggering. The previous two versions of the web each redefined "big data." We are on the cusp of another inflection point. There isn't enough engineering bandwidth to make sense of this data with static / human-based systems.
Machine learning is the tool to leverage the promise of Web 3.0.
April 15, 2014
Video of my Data Science panel appearance
I was recently on a Data Science and Machine Learning for Social Good panel. I enjoyed sharing my experience working at BrightBytes. It was nice to talk with bright people that work on more than click-through rates.
(I'm 3rd from the left.)
Subscribe to:
Posts (Atom)