Machine Learning is key to the new wave of AI
Artificial Intelligence (AI) is undeniably experiencing a new wave of attention, energy, and sky-high expectations. This wave is driven by the abundance of data that is generated in our connected, digital society, and by the low-barrier availability of enormous computational resources.
Among various AI-techniques, Machine Learning (ML) in particular has come to play a key role.
Learning complex behavior from examples
Machine Learning allows us to solve complex problems, not by arduously writing new code, but by letting an existing algorithm learn new behavior from examples. We are now witnessing break-through results in image recognition, speech processing, medical diagnostics, securities trading, autonomous driving, product design and manufacturing, and much more.
Does Machine Learning replace programming?
Does the rapid ascent of Machine Learning mean that software systems will no longer need to be programmed? Will we need data scientists instead of software developers?
To those that have experienced software-related project delays, system outages, and indefinitely incomplete feature sets, a world without programmers might seem attractive.
Does Machine Learning require programming?
But no so fast. There are several reasons why Machine Learning will not replace programming, but rather make the software engineering discipline even richer and more complex.
- ML algorithms are themselves software that needs to be developed, tested, and maintained.
- Using an ML algorithm requires programming, for the tasks of ingesting, cleaning, merging, and enhancing data, for feeding the data into the ML algorithm, for running repeated training experiments to generate, evaluate, and optimize an ML model, and for testing, integrating, deploying, and operating ML models in production systems.
- Trained ML models are just one building block in the construction of complex software systems.
So, what is different?
Still, there are specific characteristics of Machine Learning that challenge traditional software development practices. The amount of data to manage is typically much larger for applications that involve Machine Learning components. The development process tends to involve more rapid-cycle experimentation, where alternative solutions are routinely attempted, compared, and discarded. And the level of inherent uncertainty in the final product is higher.
Around the globe, numerous organizations are learning step-by-step how to develop software systems that include ML components. With an increasing number of people self-identifying as ML Engineer, the discipline of Machine Learning Engineering is emerging. This raises interesting questions:
- Is ML Engineering distinct from Software Engineering? Or is one a sub-discipline of the other?
- Do established Software Engineering best-practices apply equally when building software systems with ML components? Or do these best-practices need to be modified or replaced?
- Can a canonical set of ML Engineering best-practices be identified by which practitioners can be guided and newcomers can be educated?
Investigating ML engineering practices
To investigate these questions, researchers in the fields of Software Engineering and Machine Learning have teamed up.
Aspects of ML Engineering organized into groups of practices.
Surveying the adoption of ML Engineering practices
Early results of our global survey on the adoption of engineering practices by Machine Learning teams. Larger teams tend to adopt more practices.
Also, early results tell us that some practices are widely adopted, and can be considered basic, while other practices are only applied by more experienced teams in larger organizations, and can be considered advanced.
An example of a more advanced practice is the use of so-called automated machine learning techniques, where teams are able to do model selection and hyper-parameter optimization in an automated way. Early survey results indicate that these techniques enjoy much stronger adoption in tech companies and (academic) research labs than in non-tech companies and government.
Early results of our global survey on the adoption of engineering practices by Machine Learning teams. Teams in tech companies, universities, and non-commercial research labs tend to make much more use of automated machine learning techniques than teams in non-tech companies and governmental organizations.
Towards a ML Engineering best-practice catalogue
We are using the results of our survey to organize the best practices into a comprehensive catalogue. In the catalogue, each ML engineering practice is recorded in a uniform structure, much like design patterns and refactorings have been catalogued in the past.
Elements of the structure include the intent and motivation of the practice, its applicability in various contexts, the interdependencies with other practices, and a short and actionable description of how to apply the practice. We also provide references to literature and supporting tools.
Using the survey results we are also able to quantify the difficulty of each practice. This helps us to sort them into difficulty levels from basic to advanced, giving guidance to teams to prioritize their adoption.
Our ultimate objective is that the resulting catalogue will help the formation and effectiveness of ML Engineering teams, not only in the larger tech companies where ML Engineering already enjoys strong adoption, but also in smaller and non-tech organizations.
Take the survey!
If you are part of a team that builds software that includes Machine Learning components, please help us by taking our survey.