Machine learning: An introduction - Transcript
Machine learning: An introduction - Transcript
(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Machine learning: An introduction")
Machine learning: An introduction
Welcome to Machine Learning: An introduction. Here we'll explain the basic concepts of machine learning and include a framework for responsible machine learning processes.
Learning goals
This video is recommended for those who already have some familiarity with the concepts and techniques associated with computer programming and using algorithms to analyze data. One important distinction we'll make in this video is the difference between data science, artificial intelligence and machine learning. You'll learn what machine learning can be used for, how it works and some different methods for doing it. You'll also learn how to build and use machine learning processes responsibly.
Steps of a data journey
(Text on screen: Supported by a foundation of stewardship, metadata, standards and quality)
(Diagram of the Steps of the data journey: Step 1 - define, find, gather; Step 2 - explore, clean, describe; Step 3 - analyze, model; Step 4 - tell the story. The data journey is supported by a foundation of stewardship, metadata, standards and quality.)
This diagram is a visual representation of the data journey from collecting the data to exploring, cleaning, describing and understanding the data, to analyzing the data and lastly, to communicating with others the story the data tell.
Steps 1,2 & 3: Define, find, gather; Explore, clean and describre; Analyze and model
(Diagram of the Steps of the data journey with an emphasis on Step 1 - define, find, gather; Step 2 - explore, clean and describe; Step 3 - analyze and model)
Machine learning can be used at the find, gather and protect step in the data journey to search through data and find only the parts that are needed. It can also be used at the explorer clean and describe step in the data journey to reveal what's in the data. And finally, machine learning can be used at the analyze and model step in the data journey to find relationships between variables and predict outcomes or future events.
What is data science?
(Chart containing 3 intersecting circles in the middle. The circles of the orange, left green and right blue top represent the Domain Expertise, Computer Science and Mathematics, respectively. The intersection of orange-green, green-blue and blue-orange circles represent Data Processing, Machine Learning and Statistical Research, respectively. The intersection of the three circles represents the Data Science.)
First, what is data science exactly? It's the intersection of three things: expertise in a particular domain, computer programming skills, and mathematics and statistics. Data scientists, computer scientists, statisticians and other types of scientists can all use machine learning in their work. Data science techniques such as artificial intelligence and machine learning are used to solve analytically complex problems.
What are artificial intelligence and machine learning?
Artificial intelligence, or A.I., is an area of study within the field of computer science dedicated to solving problems commonly associated with human intelligence, such as memory, problem solving and pattern recognition. One example of A.I. Would be a computer which has been programmed to recognize all possible sequences of moves in order to play the game of chess. Machine learning, or M.L., on the other hand, is a subset of A.I. where the computer learns without having been programmed for specific tasks. Instead of having lines of code telling the computer exactly what to do, in machine learning, the computer learns patterns in data and applies those patterns to predict an outcome. So when playing chess, a computer is not randomly choosing a move after assessing all possible options, but rather it's using data gathered from millions of previously played games not just to ensure that its move is valid, but to ensure the sequence is most likely to result in a win.
Why use machine learning?
Machine learning is a tool that allows for the development, adjustment and fine tuning of complex models in order to make more accurate predictions using high volumes of data. Think of it like a human brain: as it receives more data, the model improves and can draw better conclusions, leading to stronger predictions. Machine learning is also used to automate repetitive and tedious tasks that would otherwise take many hours to complete, such as sorting and categorizing online news articles.
How machine learning algorithms "learn"
Two ways that machine learning algorithms learn to predict an outcome are supervised and unsupervised learning. In supervised learning, we give the algorithm a mapping of inputs to the desired outcomes. The algorithm tries to figure out the relationship between them, so that for subsequent inputs, it can predict outcomes following the same logic as in the original mapping. An important requirement in supervised learning is to have data where both the inputs and the outcomes are known. This is called labeled data. In unsupervised learning, we don't have data with the inputs and desired outcomes. Here, the algorithm looks for similarities and patterns in the data and tries to determine a strategy for categorizing the inputs. The algorithm will apply the same strategy to categorize subsequent inputs. We'll see an example of each of these on the next two slides.
Supervised learning: Determining crop type on satellite images
(satellite image of a farm area containing farms)
Here we see an example of using supervised machine learning to predict crop type in satellite images. On the right hand side, you see an actual satellite image of farmers fields. This is the input. The first step is to identify what portions in the image are crops versus something else, such as roads, water, fences or trees, and then to identify each different type of crop. These are the outcomes. This first step has to be done by a person. The second step is to create a machine learning algorithm that reads in the satellite image and the correct label for what's in every spot of the image. From this, the algorithm learns to identify crops by how they appear on the image. For example, by the color and density in the image. Finally, the algorithm reads an image it's never seen before and tries to predict which crops are there, based on what it learned in the second step.
unsupervised learning: Detecting credit card fraud
In this example, we see how an unsupervised machine learning algorithm can be used to sort out fraudulent transactions from all the legitimate ones. As a first step, all transactions, for a period of time, are passed through the algorithm. The algorithm looks at many different attributes of each transaction, such as the date, the amount, the location, type of store and type of product or service that was purchased. Then the algorithm is asked to sort the transactions into groups. In this case, we believe that fraud is a rare event, so we would expect a very small percentage of transactions to be separated out from the rest. Remember, in unsupervised learning, we don't know in advance which transactions are legitimate and which are fraudulent. The next few slides will introduce you to some machine learning methods. We don't cover all of them here in this short video.
Machine learning methods: Image processing
One machine learning method is image processing. We already saw how this works in the satellite image and crop type example. This method is used to extract information from images, find patterns, segment an image or compress an image, so it takes up less storage space.
Machine learning methods: Natural language processing
Natural language processing is a method of translating between computer and human languages. The goal of natural language processing is to get a computer to read a line of text and understand the meaning, just as a person would. An example is a chatbot. It expects people to type "how do I" or "I can't find" followed by keywords that refer to things one should be able to do or find on that particular website and then it provides the appropriate response. With each interaction the chatbot has, it learns to be more and more sophisticated in how it interprets what people type and how it phrases its responses.
Machine learning methods: Sentiment analysis
Sentiment analysis is a machine learning method that interprets the emotions within text to measure the inclination of people's opinions, whether they're positive, negative or neutral. An example is reading and interpreting people's sentiments from reviews of dining experiences in restaurants.
Machine learning methods: Deep learning
Have you ever been shown an image and it's all fuzzy and you're supposed to guess what it is? Then gradually the resolution gets better and better. So first, you know, it's a person and then you see, oh, it's a woman, and then you recognize the unique physical characteristics that differentiate your sister from a stranger, even if they have the same height, hair and eye color. That's how deep learning works. The algorithm makes many passes over the same data, gaining precision each time until it can predict what the image actually is. It works using structures of interconnected nodes that imitate the workings of a human brain. An example of deep learning is self-driving cars. The onboard cameras are constantly feeding deep learning algorithms in the car's computer that analyze and interpret the images of its surroundings and adjust the speed and direction of travel so as to avoid collisions.
Building a responsible ML process
Machine learning processes are typically developed using open source code and code written in house. All machine learning processes should meet certain quality standards no matter who develops them or what purpose they're used for. Quality standards include the following aspects: Rigorous - in terms of the scientific methods used and the testing they go through. Responsible - in terms of how they're used and what they're used for. Trustworthy - in terms of sound implementation. Ethical - both in terms of the data and the algorithms themselves. To ensure that machine learning processes at Statistics Canada meet these expectations, we've developed a framework for responsible machine learning processes.
Framework for responsible machine learning
(Text: Assessed through self-evaluation and peer review, using a checklist and producing a report or dashboard)
(Circular diagram on the ethics of responsible machine learning. In a clockwise direction, beginning in the upper left, is titled: Respect for people; Sound application; Sound methods; Respect for data)
This is a visual representation of the framework for responsible machine learning processes at Statistics Canada. The framework is built on four themes: Respect for people, Respect for data, Sound application and Sound methods. Each theme has several attributes.
Framework for responsible machine learning
(Text: Trustworthy insight from responsible machine learning processes)
Let's go through the themes one by one. A machine learning process ensures respect for people by ensuring there's no bias or discrimination in the learning data. Everyone is treated fairly. A machine learning process that ensures respect for data is one that protects privacy of people and businesses, ensures security of data through all processing steps, and protects confidential information to prevent disclosure. A machine learning process that has sound application is one that ensures transparency and reproducibility of both the process and the results. A machine learning process that has sound methods is one whose methods are compliant with quality guidelines and uses appropriate metrics to measure accuracy and performance.
How to use ML processes responsibly
It's not enough to simply build responsible machine learning processes, they also have to be used responsibly. This means monitoring performance metrics through time. There could be evolution in the data processed by the algorithm, so it's important to monitor performance and retune the algorithm when necessary. There should be human oversight and accountability at all steps. People are ultimately responsible for all predictions and decisions that are the output of a machine learning algorithm. For all systems that use machine learning processes and most importantly, for those that directly support or make administrative decisions, it's essential to implement and enforce protocols on their use. For machine learning processes in the government of Canada, this means ensuring compliance with the directive on automated decision making from the Treasury Board Secretariat.
Recap of key points
In this video, you learned that data science is the intersection of subject matter expertise, computer programming, mathematics and statistics. Machine learning is a subset of artificial intelligence that focuses on teaching computers how to learn without the need to be programmed for specific tasks. Supervised and unsupervised are two types of machine learning used to predict an outcome. And we also presented a framework for how to build and use machine learning algorithms responsibly.
Further learning
You can find Statistics Canada's framework for responsible machine learning in the Data Literacy Initiative Learning Catalog, where you found this video. If you want to learn more about the use of artificial intelligence and machine learning in the government of Canada, enter "Treasury Board Secretariat Directive on automated decision making" in the search field of your favorite browser.
(The Canada Wordmark appears.)