- Have good "fake" data. "Fake" data allows for testing of the algorithms. There is an art to dummy data, it should be noisy but allow the models to converge.
- Linearly transform the data so all dimensions are within the same order of magnitude. Some models have trouble with weighting noise parameters on different scales.
May 12, 2014
A couple of old model fitting tricks
I picked up many modeling tricks in graduate school working in a computational neuroscience laboratory. We doing machine learning, but we don't know at the time. We called it "automated model fitting." We used custom (and very finicky) algorithms in MATLAB. Now people are blessed with scikit-learn. These ideas might help out for edge cases:
May 5, 2014
My recent failures
I fail all the time. I recently failed at completing several MOOCs. I'm one of the unwashed masses that started but didn't finish.
Even through attempting, I learned something from each course. Given my personal value in life-long learning, improvement is my measure of success. Completion is easier to quantify and a more commonly accepted measure of success. A "factory" model of education values completion. A half completed commodity has little value. A factory stops work on a product and ships it. My professional life is continuous delivery. Any incremental improvement adds value to a continuous delivery system. I picked up new viewpoints on existing concepts that improved my understanding of the world through "failing" at MOOCs. I didn't get credit but I got value.
Even through attempting, I learned something from each course. Given my personal value in life-long learning, improvement is my measure of success. Completion is easier to quantify and a more commonly accepted measure of success. A "factory" model of education values completion. A half completed commodity has little value. A factory stops work on a product and ships it. My professional life is continuous delivery. Any incremental improvement adds value to a continuous delivery system. I picked up new viewpoints on existing concepts that improved my understanding of the world through "failing" at MOOCs. I didn't get credit but I got value.
April 28, 2014
The neurobiological limits of free will
Animals have limited, possibly no, choices. Biology and circumstances are their destiny.
I, being a human, have choices. They are limited but present. Like all humans, I have deep neural ruts (i.e., habits) that guide most of my thoughts and behaviors. They are not a permanently fixed boundaries; I can change them (slowly). I have the free will to choose to change those habits. It ain't easy (or quick) but possible.
Understanding the neurobiology can help. Knowing that simple repetition is more important than willpower. Knowing that old habits are easily cued, even after long periods of being dormant. Knowing that all brains have strong default systems, mostly around fear, that don't serve higher callings in the modern world. Knowing that there is never conscious access to unconscious processes, thus just trusting that providing "open space" will allow unconscious processes to manifest.
I'm humbled because I don't change even what is within my very limited ability. I "know" the rules but still don't play the game of change well.
I, being a human, have choices. They are limited but present. Like all humans, I have deep neural ruts (i.e., habits) that guide most of my thoughts and behaviors. They are not a permanently fixed boundaries; I can change them (slowly). I have the free will to choose to change those habits. It ain't easy (or quick) but possible.
Understanding the neurobiology can help. Knowing that simple repetition is more important than willpower. Knowing that old habits are easily cued, even after long periods of being dormant. Knowing that all brains have strong default systems, mostly around fear, that don't serve higher callings in the modern world. Knowing that there is never conscious access to unconscious processes, thus just trusting that providing "open space" will allow unconscious processes to manifest.
I'm humbled because I don't change even what is within my very limited ability. I "know" the rules but still don't play the game of change well.
April 21, 2014
Web 3.0 (possibly)
Web 1.0 connected ideas with webpages and hyperlinks.
Web 2.0 connects people with social media and mobile.
Web 3.0 could connect physical things with automation.
The Internet of Things (IoT), both the concept and the necessary features (e.g., infrastructure, tools, and protocols), are gaining momentum.
Each successive stage of the web relies on previous stages. The previous stage becomes the assembly language of the next stage. If static web content can be generated automatically, there is then engineering bandwidth to tackle dynamic web content.
The number of processes that can be automated continue to increase. We are now the cusp of the next generation of web automation - the cost of aggregating the information from the physical items, from consumer products to industrial processes, is dropping below the return-on-investment threshold.
The potential amount of data from this transition is staggering. The previous two versions of the web each redefined "big data." We are on the cusp of another inflection point. There isn't enough engineering bandwidth to make sense of this data with static / human-based systems.
The most promising direction is automated learning, aka things that get smarter the more you use them. Previous generations of technology were static. Excel or your browser doesn't get any better the more they are used. However, there is now software that learns as you use it and search algorithms can get smarter the more they are used.
Machine learning is the tool to leverage the promise of Web 3.0.
Web 2.0 connects people with social media and mobile.
Web 3.0 could connect physical things with automation.
The Internet of Things (IoT), both the concept and the necessary features (e.g., infrastructure, tools, and protocols), are gaining momentum.
Each successive stage of the web relies on previous stages. The previous stage becomes the assembly language of the next stage. If static web content can be generated automatically, there is then engineering bandwidth to tackle dynamic web content.
The number of processes that can be automated continue to increase. We are now the cusp of the next generation of web automation - the cost of aggregating the information from the physical items, from consumer products to industrial processes, is dropping below the return-on-investment threshold.
The potential amount of data from this transition is staggering. The previous two versions of the web each redefined "big data." We are on the cusp of another inflection point. There isn't enough engineering bandwidth to make sense of this data with static / human-based systems.
Machine learning is the tool to leverage the promise of Web 3.0.
April 15, 2014
Video of my Data Science panel appearance
I was recently on a Data Science and Machine Learning for Social Good panel. I enjoyed sharing my experience working at BrightBytes. It was nice to talk with bright people that work on more than click-through rates.
(I'm 3rd from the left.)
April 14, 2014
The connection between broadsiding and blogging
On a recent visit to the National Archives, I was engrossed with a broadside display.
Broadsides are the blogs of their day, a blend of art and news.
Both have a lower barrier to entry compared to other respective contemporary formats (e.g., treatises or whitepapers). Both are designed to be ephemera. However, an individual exemplar sometimes resonants beyond its intended lifespan and audience. That long-term impact potential is greater for blogs. Since blogs are digital, they are searchable and sharable without limits. Since broadsides are printed, they are static and nonscabable. Their size and content is limited by the properties of the physical press, printing press of the broadside heyday most cheaply produced a single page.
The illusion of being ephemera makes both more accessible, thus encouraging writing (and publication). Even if the writing is intended to be disposable, more writing makes better writers.
Broadsides are the blogs of their day, a blend of art and news.
Both have a lower barrier to entry compared to other respective contemporary formats (e.g., treatises or whitepapers). Both are designed to be ephemera. However, an individual exemplar sometimes resonants beyond its intended lifespan and audience. That long-term impact potential is greater for blogs. Since blogs are digital, they are searchable and sharable without limits. Since broadsides are printed, they are static and nonscabable. Their size and content is limited by the properties of the physical press, printing press of the broadside heyday most cheaply produced a single page.
The illusion of being ephemera makes both more accessible, thus encouraging writing (and publication). Even if the writing is intended to be disposable, more writing makes better writers.
April 7, 2014
Treating manuscripts like pieces of code
As I revise a manuscript, I find bugs (i.e., things that are not the way they should be). If they can be fixed in less than two minutes, then I immediately correct them. Otherwise, I stack and track via a bug list. My bug list has two contexts: full focus and brain dead.
Brain dead bugs can be fixed with minimal cognitive effort. For example, fixing the axis on a figure.
Full focus bugs require cognitive horsepower. For example, synthesizing previous research.
Brain dead bugs can be fixed with minimal cognitive effort. For example, fixing the axis on a figure.
Full focus bugs require cognitive horsepower. For example, synthesizing previous research.
Subscribe to:
Posts (Atom)