AI in Practice, With and Without Data

AI in Practice, With and Without Data

“What makes a problem a problem is not that a large amount of search is required for its solution, but that a large amount would be required if a requisite level of intelligence were not applied.”

–Allen Newell, one of the father of AI in the 50’s

“What is AI?” is an old question for which there is no answer researchers and practitioners agree upon. The reason is, since its inception, AI did not only attract the attention of computer scientists, but also that of neuroscientists, economists, and social scientists who participated in shaping it. AI is now a big field that encompasses science, technology, engineering, and mathematics (STEM) with impacts on our private, professional, and social lives.

In this paper, I define AI as a set of engineering methods to harness complexity. Examples includes optimizing the advertising budget of a big brand across TV, outdoor, radio, print, and digital media worldwide; forecasting the sales of a new product in each region where the product will be marketed for the first time; personalize discount coupons to help customers to save money, the brand raise sales, and the merchants receive foot traffic.

Before explaining how AI can be used in practice, there are two important things to know. First, no real-world AI system is static. It needs continuous attention of the AI engineer to review results and tune parameters and algorithms. Without that feedback loop, AI will either be unreliable or will probably fail.

Second, no single method works for every situation. This explains the existence of a large set of methods with their aficionados and critics. So, rather than listing all the methods (for which, I would rather recommend Stuart Russell and Peter Norvig 1000+ seminal book on AI), I prefer to group all the real situations into five categories that are common to industry, business, and government.

This is the situation under news spotlight everywhere. The methods used here go by the names of data mining, machine learning, neural networks, or deep learning, depending on the volume of the data at hand, the sophistication of the architecture used, but also on the background of the researchers or practitioners. There are three sorts of learning, from the least sophisticated to the most sophisticated: supervised learning, unsupervised learning, and reinforcement learning.

Data-based methods work well for situations where new data observed do not deviate too much from old data learned. In particular, data-intensive methods showed astonishing results in the domains of image, speech, and language understanding, and also in gaming. In fact, they are the quintessence implementation of what Economy Nobel Prize Daniel Kahneman refers to as System-1 in his theory about the mind. Based on this theory, the mind is composed of two systems: System-1 governs our perception and classification, and System-2 governs our reasoning and planning.

While machine learning and deep learning were getting more attention, questions raised on what to do in situations with insufficient quantity of data or poor-quality data. Fortunately, we aren’t hopeless, as there are others methods that harness well situations with no data data or a few of it.

We tend to forget but industries and businesses already solved complex problems before the advent of machine learning and deep learning. Researchers and practitioners used equations and constraints to find the price, package, and place that would maximize their profit given a constrained budget; serialize the steps of a media plan that produces the maximum audience reach; or develop the go-to-market strategy of a new product that will minimize the brand’s launch risk.

Some readers may argue that this is not modern AI, but we should not forget that all deep learning is about minimizing functions or maximizing likelihoods. Indeed, at its foundations, deep learning is linear algebra, probability calculus, and mathematical optimization.

Here again, the methods have different names: operational research, decision science, or mathematical optimization. Whatever the name, they all start with a situation specification that details the parameters, constants, objectives, and constraints. And there is no one but many alternative methods depending on whether we have one or multiple objectives, integer- or real-valued parameters, and logical or mathematical constraints.

Modeling optimization problems with mathematical formulas can be challenging as the problem at hand is often intractable. And this was the reason for the emergence of the knowledge and heuristic-based approach that do not search for optimal solution but rather for satisfactory ones.

Knowledge and heuristics-based approaches were the rage of the 80’s and the 90’s. Contrary to a common belief, they solved too many problems where knowledge could be easily modeled such as alarm correlation in network management, product configuration in B2B sales, and regulation checking in private investment banking. Nowadays, they are hidden in lower layers of middle-wear software that constitutes the backbone of large industrial, business, and financial systems.

In a knowledge-based system, knowledge is domain-specific, encoded in the form of predicate-statement rules or condition-procedure. An inference engine is in charge of checking all the predicates/conditions and firing the statements/procedures. To select which statement to add or procedure to run, the inference engine relies on heuristics that are also domain-specific. When problems are more complex, we can assemble the rules into separate self-contained knowledge sources, add a set of meta-rules whose role is to select the knowledge sources to execute when certain conditions happen on the global solution.

For instance, with all the knowledge we have accumulated, we can use rules and meta-rules to select which machine learning or deep learning algorithms to use depending on the quantity and quality of data at hand.

To quote Daphne Koller, “the world is noisy and messy” and we need to deal with noise and uncertainty, even when data is available in quantity. Here, we enter the domain of probability theory and the best set of methods to consider is probabilistic graphical models where you model the subject under consideration. There are three kinds of probabilistic graphical models, from the least sophisticated to the most sophisticated: Bayesian networks, Markov networks, and hybrid networks.

In these methods, you create a model that captures all the relevant general knowledge about the subject in quantitative, probabilistic terms, such as the cause-effect network of a troubleshooting application. Then, for a particular situation, you apply the model to any specific data you have to draw conclusions from.

As for machine learning and deep learning, you can use probabilistic graphical models to learn from past events to better predict future events. But probabilistic graphical models add a capability that neither machine nor even deep learning have. Probabilistic graphical models have the power to infer the cause of events and therefore implement what Judea Pearl calls causality engine.

Until recently, probabilistic graphical models were outside the news’ radar, except the specialized journals and conferences, while making huge progress thanks to the appearance of probabilistic programming languages. These new languages are turning probabilistic graphical models into programs accessible to everyone with a background in coding in procedural, object-oriented, or functional programming.

Mathematician and philosopher Descartes proposed to divide every complex problem into manageable parts. In 1969, Economy Nobel Prize Herbert Simon continued the same line of reasoning and proposed to decompose complex systems into a hierarchy of manageable sub-systems. While far from each other, these two thinkers and others who followed them were at the origin of the procedural reasoning, blackboard, and multi-agent approaches, used in AI in situations where no high-quantity and high-quality data, no known model, no prior knowledge, and no tractable mathematical solution exist.

The three approaches divide the problem into manageable pieces, allocate a piece to a knowledge area (in a procedural reasoning system), a knowledge source (in a blackboard system), or an agent (in a multi-agent system) to work on it and share its results with the other knowledge areas, knowledge sources, or agents until they arrive at a solution together. In a procedural reasoning or blackboard system, communication between the knowledge sources takes place through a multi-level database they share. In a multi-agent systems, communication takes the form of messages the agents exchange directly among them. A knowledge area or knowledge source could be any executable code, that is, a signal processing program, an image segmentation program written in a different language, or a tree search program. Indeed, an agent could be a full procedural reasoning or blackboard system.

Nowadays, deep learning gets the spotlight but the reality is situation 5. For instance, AlphaGo is a prowess of combination of machine learning, tree search, and expert knowledge. And Netflix’s recommendation system is not one but a multitude of algorithms.

Interestingly, whatever the situation, the methods require a certain amount of human work.

In situation 1 (machine learning and deep learning), you have to split data into training, testing, and validation data; select a model and fine-tune hyper-parameters; run the model; and repeat until you are satisfied with the results. A common rule of thumb states that up to 80% of a machine learning project goes to data collection, auditing, cleaning, and unification. And once the application is in production, you have to monitor its performance as the new data may deviate from historical data on which the application was built.

In situation 2 (mathematical optimization), you have to frame and structure the problem at hand into parameters, constants, and constraints; select and run an algorithm on these data; often, the algorithm may never converge in which case, you have to relax some of the constraints and repeat the optimization process again and again until you find an acceptable solution to the initial problem.

In situation 3 (knowledge-based systems), you have to extract domain-knowledge from experts to encode it into rules and meta-rules; as the world is never easy to capture in one step, often you have to modify rules or add new ones. Once the number and heterogeneity of the rules cross a certain level of complexity, there is no choice but to organize the rules into manageable knowledge sources with a sophisticated control mechanism such as the one used in blackboard systems.

In situation 4 (probabilistic graphical models), you have to define the variables, their probabilities, the relations in-between them, and the cause-to-effect if any. Probabilistic programming languages help but it is up to you to tell if the variables are discrete or continuous, whether the network is a Bayesian or a Markov network, and which prior probabilities you have.

In situation 5 (procedural reasoning, blackboard, and multi-agent systems); you have to decide on the architecture, if it should be centralized or decentralized, the type and number of knowledge areas, knowledge sources, or agents. If you think building a deep learning or a probabilistic reasoning system is complex, building a procedural reasoning, blackboard, or a multi-agent system is even more complex. In fact, the complexity here is compounded. Building such systems is like implementing a big part of Stuart Russell and Peter Norvig’s book. That is why artificial general intelligence (AGI) is one of the daunting endeavor we shouldn’t expect to deliver results in a near-term future, beyond some advanced research labs.

AI is an old field that makes progress through waves. It is not a hard science as mathematics or physics; it is not a soft science as psychology or sociology; it crosses both types of sciences and it is still evolving. This is one of the main reasons why it is still hard to define into one single definition on which all the researchers and practitioners agree upon.

Data-hungry machine learning and deep learning methods had made huge progress, especially thanks to the open source platforms such as Scikit-Learn, Keras, Tensorflow, and PyTorch. More will come out as researchers are working to combine machine learning with machine reasoning, through the extension of programming languages to support causality inference. The most advanced probability programming languages (PPL) include Figaro (superset of Scala), Gen (superset of Julia), Pyro (superset of Python), and Edward (also superset of Python). It will not be a surprise to see in a near-future new platforms to help us building more powerful AI applications, whatever the situation we will have at hand. Indeed, we may even see the rebirth of other theories such as Society of the Mind by Marvin Minsky and Cellular Automata by Stephen Wolfram. Things are changing fast. In the meantime, as rules of thumb, proceed as follows:

For readers who want to dive into the technical details behind the article, here are the sources listed in the order of their mention in the article:

1. Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig: Covers all the subjects of AI from perception to reasoning, to action, and to learning.

2. Cracked It! by Bernard Garrette, Corey Phelps, and Olivier Sibony: My favorite book on how to frame, structure, and solve complex problems, and explain and sell solutions.

3. Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville:The textbook on deep learning ; requires strong background in mathematics, especially linear algebra, probability calculus, and function optimization.

4. Thinking, Fast and Slow by Daniel Kahneman: Summarizes in plain text Daniel Kahneman and Amos Tversky’s theory that splits the brain into system 1 (reactive, emotional) and system 2 (deliberative, rational).

5. Algorithms for Optimization by Mykel Kochenderfer and Tim Wheeler:Complete coverage of mathematical optimization presented in an easy way, with code in the modern language Julia; requires background in mathematics.

6. Unified Theories of Cognition by Allen Newell: My favorite book on product rules, the foundation of symbolic AI, by one of the fathers of AI in the 50s; no mathematics background is required.

7. Probabilistic Graphical Models by Daphne Koller and Nir Friedman:Contains everything you want to learn about probability calculus, Bayesian networks, Markov networks, and probabilistic inference.

8. Blackboard Architectures and Applications by V. Jagannathan, Rajendra Dodhiwala, and Lawrence Baum:Still the reference on the blackboard-based architecture for building systems with heterogeneous knowledge sources.

9. The Handbook of Artificial Intelligence, Volume IV by Avron Barr, Paul Cohen, and Edward Feigenbaum:Still my favorite source for multi-agent systems.

10. The Society of Mind by Marvin Minsky:Uses the term agent but in this book, agents are much simpler. In one sense, they are to symbolic AI what nodes are in a neural network.

The intent of this article is to help practitioners to capture the different flavors of AI through five situations. Although based on previous research, consulting, and implementation work, the views, thoughts, and opinions expressed in this article belong solely to the author, and not to the author’s current or previous clients or employers. The author welcomes comments, please post them here or on LinkedIn.

Consultant in strategy implementation with more than 15 years of experience in digital and data technologies for business transformation. Holds a PhD in AI for complex pattern recognition, problem-solving, and decision-making.