Natural Language Processing – Talking In A Way Machines Can Understand
Anyone who has ever used a search engine or virtual assistant is using natural language processing (NLP). Who knew that NLP is artificial intelligence (AI)? Well, probably most of the readers of this blog. In 2018, Karen Hao in an MIT Technology Review piece – Is This AI? We Drew You A Flowchart to Work It Out, spells out the connections:
- Can it hear? If yes, does it respond in a useful and sensible way? If yes, then it is If it is using NLP, then it is AI.
- Can it read? If yes, is it reading what you type? If yes, does it respond in a useful and sensible way? If yes, then it is NLP. If it is using NLP, then it is AI.
- Can it read? If yes, is it reading what you type? If no, is it reading passages of text?If yes, is it analyzing it for patterns? If yes, then it is NLP. If it is using NLP, then it is AI.
Of course, NLP is a lot more complicated than that. The technology has been around since the 1950s. In 2017, NLP took a leap forward when “Attention Is All You Need” introduced Transformer, a neural network architecture based on a self-attention mechanism well-suited for language understanding. Transformer speeds processing through the use of parallel processing and steps that allow the machine – machines talk in numbers – to understand context. This is where deep learning comes in.
Neural networks are not the only way to do NLP. There are other approaches, such frequency-based and lexical database techniques. EPRI and Pacific Gas and Electric Company (PG&E) speakers did a great job of describing NLP in a podcast broadcast during the UA Summit Reimagined 2020.
In the last few years, NLP has made its way into the utility analytics tool chest. Here are some of the ways utilities are using NLP:
- Duke Energy is using NLP to quickly read through unstructured text outage liability claims.Basic rules are applied to route the claims to appropriate personnel for resolution.
- EPRI, in conjunction with Sandia Labs, is using an NLP to analyze the causes of photo-voltaic (PV) failures.The project uses a technique called term frequency–inverse document frequency to examine unstructured text in PV asset management databases.
- PG&E uses NLP to reduce the risk of dig-ins. PG&E’s data scientists and engineers built a self-service, cloud-based application using text classification, embedding dictionaries, convolutional feature maps, long short–term memory (LSTM), and topic modeling.
- A gas utility is using NLP to identify historical anomalous consumption for leak detection.
Like other industries, utilities use industry-specific language. Generic dictionaries are not sufficient to classify text. However, finding a well-accepted, common utility-specific language is not easy. In a recent study, EPRI found differences in text format, syntax, and taxonomy among three utilities. What’s needed is greater standardization of models, dictionaries and data collection methods. A lot can be done through adopting industry models such as the common information model (CIM), but more work needs to be done on data that is not a part of this model.