Robot
Agents replacing Manual Labor
Guido outsourced
by Guru
T McCabe
January 2019
A true story.
Blanco Bank is
located at 5254 Lincoln Avenue, Monterrey Mexico. Its motto of ‘rock solid
service’ was shaken when it installed an ‘expert system’.
On the morning of
October 14, 2016, Mr. Guido Garcia, a twenty-five-year Blanco Bank teller, sat
down with the newly hired architect of an expert system. Mr. Garcia was less
than excited; the expert, Dr. Guru, was viewed as the enemy and yet he expects
to get the teller job nuances from about-to-get-fired Guido Garcia. Dr Guru
generates a narrative description of Guido’s manual labor, walks Mr. Garcia to
the outplacement office, whispers “yikes”, and runs off to do his ‘real work’ –
– building the expert system.
Big surprise, it didn’t
work.
Not really a
surprise, there has been no testing of the requirements nor is there a plan for
testing the ‘as built’ AI system. It’s easy for Guru – – he blames it on the ex-employee
Guido; who is long gone --- and embittered. And Dr. Guru laborers on, painfully
discovering missing nuance after missing nuance, one by one, building multiple
failing AI versions. The budget is blown by a factor of ten. Typical. But not
that unexpected, from Guru: ‘these AI systems are a challenge’.
Here is a better
approach.
Let’s go back to
the morning of October 14 with Mr Garcia describing his bank job. He can
describe this as a business process – – there’s a place he starts, there’s work
he does, there are decisions he makes, there are iterations he goes through, and then
finally, at some point, he’s done. In practice
this is often done creating a business process model (BPM) -- there are several
such popular business modeling languages currently in use -- Business Process
Modeling Notation (BPMN), XML Process Definition Language (XPDL) and Business
Process Execution Language (BPEL)1
Mr Garcia’s description done otherwise was overly
verbose and just conversational – – not the rigor of a formal BPM. However, it
is indeed an algorithm, in narrative form. And for an AI system to replace a
manual worker, the expert system has to start with an algorithm.
Such a loosely described algorithm has
inherent complexity. In fact, it intrinsically has the classical McCabe Cyclomatic
Complexity. Which will tell both Guru and Guido about the inherent complexity
of the job – – it can be compared to other jobs that have been automated in
terms of complexity. The complexity will predict how much work it’s going to be
to build such an AI agent. Also, importantly, the complexity will determine the
requirements validation tests to run on both the narrative description and also
on the AI system when it’s complete.
This requirements validation
can be done straight from the narrative – – sloppy but effective. A better way
is to use a Business Process Model (BPM) language to describe the teller’s job.
It is common practice to compute the
McCabe Complexity of our BPM job description – notice here we are getting the complexity of the requirements.
The McCabe
complexity is the number of basis test paths within the bank’s BPM. It is
common practice to limit the complexity of BPMs with McCabe complexity (see Reference 1) – what’s new here is using
complexity to generate the BPM test paths. It will generate the bases test
paths and data – to both validate the BPM and to get tagged learning data for
the robot agent. More rigorously, the complexity delineated test cases form an
equivalence partition of the universe of robot agent test data – see footnote.
At this point, Mr. Garcia and Dr Guru would walk
through each BPM generated basis path – whereby flushing out errors, as Mr Garcia
explains the nuances of each path. Even though this looks like unit testing Guido
and Guru are in fact testing the requirements before building the forthcoming
artificial intelligence system.
Requirements
errors are very expensive, or the order of 270x the cost of ‘coding errors’.
Best to catch them right here.
The very same
bases test paths derived from the BPM description serve as a good foundation
for an acceptance test of the as built AI system. Each equivalence class would
be expanded with nuanced test data. The
acceptance test team should include knowledgeable bank employees, including Mr.
Garcia.
Ranking the
portfolio of the Afirme Bank’s manual jobs by their McCabe complexity gives
order of magnitude estimates of both the job of building an expert system and
the inherent testing that must take place. Also, keeping track of the number of
requirements errors up front will predict the reliability of the as build AI
system.
What is not explained
here, is the big pay off in rigor. The current state-of-practice does not
include requirements testing and modeling of the upfront business process.
There are many reasons and many excuses for not validating an AI system at the
requirements stage. Here is a way to test a robot agent, from the requirements
before building it, from the requirements after it’s built, and with the
participation of the very workers who had been doing the job beforehand.
Not to mention, Mr. Garcia gets
some respect.
------------------------------------
Footnote:
The
use cases so derived from a BPM or job description become an equivalence
partition of the test data universe for the robotic agent. It gives at least
one test case per equivalence class to validate the requirements up front.
What's more for the millions of AI data points – – called tagged data – – to
teach and test the robot agent --the equivalence classes generated upfront
become a classification scheme. This means that all subsequent training and
test data for the agent is cleanly partitioned into said equivalence classes.
One corollary
of this result is the possible machine generation of robotic test data within
each equivalence class. A machine could fill out the data within each
equivalence class and make the data robust and comprehensive for robotic
training in robotic testing.
Appendix 1:
Beside a narrative
job description or building a BPM another common practice to derive learning
data for an agent robot -- such as our robot teller -- is to gather, analyze,
and transform log data. Since this job
is being done by a human and is partly automated, the log data will be a mix of
hand created log entries and computer-generated log entries. The care and
feeding of log entries is a messy dirty job -- often given to a data scientist.
It involves collecting massive amounts of data, often terabytes, from a variety
of databases --- log files can be transaction log files, event log files, audit
log files, server logs. . This data is often spread across distributed
databases each with unique formats. As you can see, this is messy dirty business.
Our methodology of using the BPM test paths is much cleaner.
Appendix 2:
This article
ignores the characteristic many AI systems share wherein the artificial
intelligence learns as it goes. This issue gives rise to the notion of partial algorithms
and to the derivative notion of the complexity of partial algorithms. This will
be discussed in a sister article.
Appendix 3:
The requirements
validation is a path by path walk-through of the BPM – – actually a
walk-through of each of the bases paths of the BPM. This is a nontraditional
but more effective way to conduct a walk-through. It's more rigorous than
walking through the BPM line by line because we're going through paths one by
one; in effect testing as the computer would execute them. We are indeed testing
the requirements before writing any code.
Appendix 4:
‘Machine Learning’
being taught by log data is an alternative to our approach here. Typically done
with log data that has to be resurrected from within corporate databases. There
is a class of errors that using just transaction logs will miss. It's error by
omission.
For example, it
was recently reported that a hospital AI agent was built to diagnose and triage
pneumonia patients. It worked well except it missed a major category of
pneumonia – – when somebody also has asthma. Doctors and emergency room nurses
know well that having asthma and also having pneumonia will send somebody straight
to intensive care. The AI system missed this. The intent was to send people
home with antibiotics quickly – – and save hospital time and money. It was a
major flaw and that people could die as a result of it.
This is an example
of error by omission. When you take existing log data for training machine
learning there is always the possibility you're missing a category of data – –
you miss an equivalence class of test data.
Log data is messy, has to be cleaned up, and it’s easy to miss a whole
category of data.
Using a BPM model
upfront would not make the same mistake. An explicit equivalence class of
patients with pneumonia and also with asthma would have been built into the
test data.
Epilog:
Four months later,
as you can tell by the picture above, hard times fell on the good Dr. Guru.
Guru did not follow the methodology described above and delivered his expert
system three months late; he claimed it had been thoroughly tested. The bank trusted his judgment and put his
expert system into operation the next day.
Whereupon it
failed. Not on just some boundary conditions; it failed spectacularly every
time. Bonito Blanco, the founder and
president of the bank, was enraged and fired Guru on the spot.
Bonito went to
Garcia's home to beg him back to his old job. It took two months for Benito to
locate Garcia, who had downsized and moved to less expensive El Barrio. When
the two men finally confronted each other, Benito offered to double Garcia’s
pay.
It was too late.
Garcia had taken another job at a competitive bank. He got the job during the
interview describing the horrific mistake Blanco Bank had made with that
foolish expert system and that cranky Dr. Guru.
Garcia’s first
task on his new job was to brief all the executives in the new bank – imploring
them to avoid, at all costs, any expert system or anyone with a name like Guru.
Postscript:
A true
story? Not in the historical sense.
But more than true
in our technology lives. Hundreds of millions of dollars have been lost because
of a lack of upfront testing of requirements. In this sense, the story is sadly
more than true. It’s true as a modern-day allegory.
A Jewish proverb
has it that 'story is truer than truth' --- also, in this sense the story is
true.
Reference 1: See ‘Managing the Complexity of Business
Process Models’, ftp://public.dhe.ibm.com/software/solutions/soa/newsletter/2010/newsletter-apr10-article_complex_bus_processes.pdf
No comments:
Post a Comment