My career applying mathematics in engineering

me_outside1I spent two years in the merchant navy after leaving school (because I wanted the experience), and then studied Ship Science at Southampton.  For my PhD and post-doc work I was interested in ship dynamics, especially roll motion, and this was my introduction to stochastic nonlinear systems.  I used models in both senses — mathematical and physical.  For the latter I used models of actual ships, in towing and wave tanks at NPL near London and an Admiralty research facility near Portsmouth.  I was very interested in the fact that we could model nonlinear resonance mathematically, and then compare the predictions with experiments using an actual boat.  It was always fun when we made the model capsize, but of course the purpose of the work was to try to prevent that from happening to the real thing.

I joined the Bristol Engineering Mathematics department in 1978 by which time it had existed for quite a while. The degree in Engineering Mathematics had just started with the first intake, and I was one of the new members of staff recruited to support it.

Having had the experience of applying a mathematical model to an actual system in order to improve it (in this case to make it safer), I was drawn to the department because I could see opportunities to do this more broadly. I was (and am even more so now) very impressed with the diversity and breadth of work going on in the department.  I think it’s phenomenal.

During my time in the department I gradually moved away from marine science (which was rather narrow) into other applications of mathematics, in computer science, and specifically in the interaction of people with computers via spoken language.  Not just speech recognition (although I worked a bit on that) but on other aspects of the speech such as identifying the language being spoken and especially the topic.  Rather than “What is the speaker saying?” I was more interested in “What is the speaker talking about?”, and this involved statistical modelling of the association between words, phrases and topics.

After 18 years in the department I made the decision to leave and move to the states. This was because there was a project going on in AT&T Labs in New Jersey to build (and hopefully deploy) a goal-directed spoken dialogue system to replace the menus you get when you call practically any organisation.  We wanted customers to be able to just speak naturally, as they would to an agent, right from the beginning.  And some sort of conversation would be needed, to clarify what the customer actually wanted and to obtain a few details.  After that the call could either be handled by some automation or transferred to an agent to complete the task.

Because of my experience in topic identification from speech, I was the ideal person to create the part of the system that answers the “What does the customer want?” question.  They offered me the chance and I decided to take it, even though it meant leaving the department (which I did with regret) and moving to New Jersey.  That was in 1996.

slu

The how-may-I-help-you? system had three main modules, for automatic speech recognition (ASR), spoken language understanding (SLU), and dialogue management (DM), plus an infrastructure for acquiring data and building the necessary models.

In order to construct meaning from a string of words, the initial task was to build a classifier that used machine learning to acquire salient words and phrases from semantically-tagged data, so that new utterances could be automatically classified by topic (with a measure of confidence).  Initially I used a form of naive Bayes classifier, but later we upgraded this to a support vector machine to improve performance.

It quickly became clear that the classifier was only part of the story however.  Within a conversation there is always a context, and (although we’re mostly unaware of it) we use this context constantly in understanding the full meaning and intention of what each person says, turn by turn.  This goes beyond classification, and I had to figure out a way to handle it so that the conversation would feel natural to the customer.

Dialogue management at that time was traditionally done using a vast set of if…then rules (if in *this* particular state in the dialogue then do *that*). These tend to be very cumbersome and hard to maintain.  My colleague who built the dialogue manager had a far more elegant object-oriented approach using an inheritance hierarchy in the form of a graph.  This was more declarative than procedural, far more compact, and therefore a lot easier to maintain. The clever part was in how the software used this together with the output of my SLU subsystem to guide the dialogue from turn to turn.  We realised that this graphical structure also provided a natural solution to the problem of how to take into account the context within a dialogue, so it killed two birds with one stone.  I had to figure out an inference procedure over this graph, in order to apply the contextual information, and for this I drew on my experience in the department. I had seen colleagues do a lot of work on fuzzy and probabilistic inference systems, and having had that experience helped me a good deal while I implemented the procedure I needed, and it worked perfectly.

This system was the first of its kind, and it was extremely exciting when the company deployed it for its main customer care service in 2000, handling 15 million calls a month. A lot of mathematics went into it, and a huge amount of work. My dialogue-manager colleague, by the way, is now a Vice President within AT&T and is managing the development and deployment of all kinds of cool and innovative new services, so she has successfully made the transition from a technical to a managerial career.  I decided to remain technical as it suited me a lot better.

To classify the success of such a system, there are various conventional metrics and we used one called the Receiver Operating Characteristic (ROC) curve.  There were knobs we could twiddle during deployment of our system, for example to control the required level of confidence for the dialogue to just proceed without confirmation.  It gets tedious for the customer if they say “I want X” and the system says “Do you want X?” every single time.  On the other hand a mis-classification is annoying to the customer, so we do want to be fairly sure. By considering various possible operating points along the ROC curve we could balance these two kinds of error. Meanwhile, by improving the classifier we pushed the whole curve towards greater accuracy.

Having deployed the dialogue system we started to acquire huge amounts of data. We needed to find ways to summarise this, in order to monitor the performance of the system and identify weak spots, but dialogues aren’t like numbers for which we can easily calculate means, variances and so forth.  A dialogue is a structured, sequential object, and any two dialogues can progress the same way in some places and differently in others. It was also important to take a “customer-focussed” viewpoint, because what each customer experiences is a sequence of prompts and responses rather than the same thing over and over again. It was those sequences that we needed to be able to summarise and explore.

I had taught discrete mathematics while in Bristol, so it occurred to me to use a finite-state machine (FSM) as a visualisation tool, with the prompts (messages played to the customer) in each node or vertex, and their responses on each edge or arc.  Each actual dialogue is then a path through this machine.  Any given set of dialogues could be transformed into such an FSM, minimised (which is a standard algorithm), and then rendered graphically so we could visualise and explore it via the web.  This worked great, so we turned it into a general tool and deployed it, not just for our own dialogue system but for others as well. The analysts loved it because it gave them a far more meaningful picture than simply counting how many times each prompt was played each day, for example, which tells you nothing about the flow from state to state (the customer’s experience).

finite

A finite state machine visualisation of dialogue paths

I then started to introduce colour to highlight changes over time, in a rather crude way. By comparing yesterday’s dialogues with the corresponding day last week, say, some things occur relatively more often and others less.  I used a standard test for this (Fisher’s exact contingency table test) and used red to denote “significantly more often” and blue for “significantly less often”.  This was useful, but it had clear limitations. For example, that day of last week may in fact have been the abnormal one, and yesterday was just returning to normal, in which case the colours are misleading. It needed a more general and longer-term view of changes over time, and that’s what I worked on next.

The AT&T network carries an enormous amount of data, currently -120PB per business day. If you wrote it onto double-sided DVDs and stacked them up, the stack would be almost twice the height of Everest. I started working on a tool for monitoring subsets of this traffic, mainly with the view to detecting when something was going wrong.

Traffic analysis means processing of a large amount of frequency data, particularly to identify anomalies, over cyclic time periods like days/weeks.

Mostly things just chug along as normal, but sometimes the frequency of a particular event (buried in this huge pile of data) can start to trend upwards or downwards, or suddenly jump to a higher level. It’s useful to discover these changes quickly, in case they are important.

freqdata

This is an hourly plot showing the frequency of a particular event over time, with daily and weekly cycles. It is quite noisy — the variance greatly exceeds the mean. The burst delineated by the red and blue lines shows an interval during which the frequency was unexpectedly high, and neither the cycle nor the noise can account for it.

In addition to being cyclic the data streams I was working on typically turned out to be noisy, in fact “over-dispersed”, meaning that the variance was larger than the mean. So they don’t obey Poisson statistics, and I needed to develop significance testing methods that were robust enough to cope with this. The procedure that turned out to work well involves a two-way exponential smoothing of the data (which is efficient and extracts the cyclic behaviour as a by-product), followed by a segmentation of the time series to reveal any trends and bursts that are present.

This was extensively deployed within AT&T. The first application was to customer care data, and I used it to illustrate how we could discover anomalies such as the special needs of customers in Louisiana and Mississippi in the immediate aftermath of Hurricane Katrina.  This made the case for more real-time data feeds, and the tool was subsequently used a lot by the AT&T Security organisation for detecting various kinds of attacks. Alarms generated by my tool were sent to the Network Operations Center to be investigated and acted on. If the Google servers were to go down, for example, then you wouldn’t be able to do a search for a while, which would be inconvenient.  But if a core communications network (or part of it) is compromised than everything is off: emergency services, financial transactions, schools and hospitals are isolated, people can’t call home. It was cool to see my code being used (in a small way and alongside other tools) to help to protect this vast network which underpins so much of the infrastructure of society.

katrina

The map shows a hot-spot in the Gulf Coast in 2015, where the customers impacted by the hurricane had unusual and urgent needs. My software discovered this from a feed of customer care agent notes data.

In 2015 I decided it was time to retire, left New Jersey and returned to Bristol. Except that I haven’t really retired!  I’m very grateful to the department for appointing me Visiting Emeritus Professor, and I’m now on the Industrial Advisory Board.  I expect to get more involved in other ways during the next academic year.  Meanwhile I’m still continuing the time series analysis work, because I think there’s more to do and it can be used in different fields.  I have an entirely new code base, and I’m using it to analyse climate data such as the Central England Temperature record, the longest continuous set of instrumental temperature readings that exists.  It’s really interesting to discover the trends and anomalies within that record and to evaluate their significance. Obviously the current warming trend is unprecedented, and this is very important.

mean

This is the monthly data since 1659, and I’ve used my tool to remove the annual cycle and plot the mean (blue) with two standard deviations (grey zone). The red line is a piecewise-linear regression and the corners locate the most significant changes.

I’ve been a keen photographer for a long time, and now I have time I’m getting more into image processing and starting to build my own website. So that’s a project that will keep me busy for a while. But mostly I want to stay involved with the department: one doesn’t lose one’s passion for science, education and research on the day one starts to draw the pension, at least I didn’t!  It’s more a way of being than a job, and I’m very lucky to have spent my career working on engineering applications of mathematics in various fields, and to have seen them make a difference.

0 Comments

Leave Your Reply

Your email address will not be published.