As computers are increasingly part of everyday life, activities that are performed by computers or recorded by computers are increasingly part of disputes, sometimes legal disputes.

When these disputes occur, it's important to be able to understand the reliability of evidence that is generated by these computer systems.

Sometimes computers directly record information, so for example, if someone was withdrawing money from an ATM the time that money was withdrawn and the amount that was withdrawn is going to be recorded.

But more often than not, the types of information that we care about are not directly recorded, so for example, the PIN is not recorded, for good security reasons, when it's entered at an ATM.

However, computers do enforce rules, or at least they're designed to enforce certain rules.

For example, they require that before money can be withdrawn the correct PIN and the correct card is used.

So together with the records that we have from the computer and what we know about the rules enforced by the computer we can infer what has happened.

So we can infer that if money is withdrawn from an ATM then the correct card and PIN were used.

However this isn't always the case.

We know of several ways of withdrawing money from an ATM without the correct card or without the correct PIN and some of the reasons for that are because of bugs in the computer systems.

Sometimes these bugs cause a computer to fail to record accurate information and sometimes the bugs cause it to fail to enforce the rules they they were supposed to.

These bugs exist because software is written by humans and humans make mistakes.

However, not all bugs result in failures.

The bugs have to be encountered in the right set of circumstances to be triggered and when a bug gets triggered that's what causes a failure.

But this is all very abstract.

Let's look at a real example of a bug.

This code is in Apple's macOS and iOS.

It was introduced in 2013 and contains a very serious bug.

That bug was undetected for four months.

What this code does is implement security for network communications.

It's there to detect network communications being tampered with, it's there to detect eavesdropping.

It applies to things like web browsing but it could also apply to other activities over the network like mobile banking.

And this code has a bug.

Here it is.

This is probably a copy and paste error.

Just as it's very easy to do this when you're writing something out in English, it's very easy to make the same mistake when you're writing code.

Just like in English, you hope that when it's being proofread these sorts of mistakes will be encountered and removed.

However that doesn't always happen.

So here's an example of a published book where there's been probably a copy and paste error leading to a duplicated paragraph.

So if you're familiar with C, this code will make sense; if you're not then the punctuation is a bit of distraction away from the underlying functionality that the code is doing, so let's replace this with a version written in more familiar terms.

Now code is normally executed by the computer in in the same way that you read it: left to right top to bottom, but code can also specify that you should jump around when it's being executed and one way of doing this is the "if" statement.

You can see one of these over here, and what this does is if a particular condition is met then the chunk of code directly below gets executed, otherwise that bit of code gets skipped.

The other way we have of moving around code is "goto" and what goto does is when this is encountered the execution will skip to a label.

The label here is "fail" so if this is executed we'll jump down to here.

To understand the problem in this code, let's run through a few different scenarios.

What the code does is check for security problems in a message, but the message is made out of three chunks of data and before we can check whether there's any security problems we need to read those three chunks, combine them, and then check for issues.

So let's assume now that there's a problem with the second part of this data

So the first thing that we're going to do is read in the first part of the data and that happens here.

We then check if there's a problem, but there's no problem, so we skip over the goto fail, and then move on to the next line.

We now read the second part of data; there is a problem here so we execute the goto fail statement and then jump down here.

Now we check if there was a problem with the last thing that was done.

The last thing that was done is checking if there's a problem in the second part of the data.

Well there was, so this code will now report that there was a problem, because there was a problem with the second part of that data.

So that code is working properly in this scenario.

Now let's consider if there's a security problem.

The security problem gets checked in this part of the code, so let's see what happens.

But before we do that, let's look at where the bug is.

So the bug is over here, and because this is indented, you might think that this belongs with the "if" statement but it doesn't.

The computer doesn't care about indentation.

If we were to indent this properly, then this statement should really go here.

Now why is this going to cause a problem?

Well let's work through through our security problem scenario.

So the first thing we're going to do is check for a problem in the first part of the data.

Well there's no problem there.

We then check for a problem in the second part of the data.

There's no problem there.

And the third part of the data; again there's going to be no problem so we skip over the first goto fail but we execute the second goto fail

That would be executed unconditionally.

And now the code jumps to the end.

It reports if there's a problem with the last thing that was done.

The last thing that was was done was reading the third part of the data; there was no problem there so this code will report that everything's fine, when actually there was a security problem.

So that is a very serious failure.

But why was this never identified?

Why was this bug left for 4 months?

Well the reason is that code normally will be tested in the scenario which normally should happen, which will be the "everything's okay scenario".

So let's work through that.

So we read the first part of the data; that's fine; second part of the data; that's fine; third part of the data; that's fine; and then we goto fail.

We check if there's a problem; well there was no problem with the last thing that was done, which is the third part of the data, and so when everything is okay.

We report that everything's okay.

It looks like the code is working properly, but in fact it never did the security test which is the whole reason this code exists in the first place, so it looks like this code's working but actually it's not.

So this is an example of code in macOS and iOS, the operating system, but it could manifest itself in any other code that relies upon this functionality provided by the operating system; and this is generally true for software.

Software failure can occur because of bugs in that software, but also in uh any other bit of software that it relies upon.

So if you've got a mobile banking app, bugs in the mobile banking app will affect it but also bugs in the operating system system and bugs in the processor on the hardware that's been used; the iPhone in this case.

And these items are all written by different people, so the mobile banking app will be written by the bank and others; the iOS operating system is written by Apple, but also others; the processors run code that is written by ARM and by Qualcomm and and lots of other different parties.

These parties are, to some extent cooperating, and to some extent competing, so they might not always share the information between each other that would be desirable for security purposes.

The other thing to take away here is that there's a lot of code.

Here the banking app might only be 50,000 lines or so, but the operating system will be tens of millions and the processor (although it's hard to estimate what code is running on a processor) it will be something like millions of lines of code, so a bug in any of those millions of lines of code could cause a problem.

But what does that look like in real life?

So here is the code that was written for the Apollo program.

This was for the guidance computer.

It's not just the 100,000 or so lines of code that went to the moon, but also the tests and different versions of this code, so there's about 600,000 lines of code here.

And this is standing next to Margaret Hamilton who was the software engineer who wrote, and also led the team that developed, this software between 1961 and 1969.

Margaret Hamilton was also one of the the founders of the field of software engineering and this code was extremely well written, but still had bugs though.

And some of those bugs, you can see in the final stages of the Apollo moonlanding.

There are ways of these bugs being mitigated; the mission still succeeded, but that's the sort of number of lines of code that was able to put mankind on the moon.

But how about this mobile banking app?

By the same scale, the pile of paper that would contain all the code that is relied upon by a mobile banking app would be the size of a building; in this case the Ministry of Justice

And in this 13 million lines of code, there's going to be a lot of bugs.

Depending on how well written and well tested code is, there's something like between 1 to 25 bugs per thousand lines of code.

So if we've got 13 million lines of code, that means there's going to be tens of thousands or hundreds of thousands of bugs that could affect a typical mobile application.

The only reason we're able to get anything done is because of testing after the software is written, and testing simulates a large number of scenarios and checks to see whether the code works.

But this testing can never be comprehensive; we can never check all the scenarios so the scenarios that are prioritized are the scenarios that are believed likely to occur in real life.

The other way that bugs are identified is once a software is in use there's going to be some failures that are are identified.

These get reported back to the company that produced that code and then they should be investigated; and when a bug is discovered those should be fixed and the updated version of the code gets shipped out.

But again, this can never be comprehensive; there's always going to be bugs that are left at the end of this process.

So if there are so many bugs, how can we rely upon electronic evidence?

The way that the law deals with this in England and Wales is based on a recommendation from the Law Commission in 1997, which is to treat computers as mechanical devices: accept that mechanical devices will have failures but start from the position that they're operating correctly, so this is a legal presumption.

The problem with this legal presumption is that the intuition that comes from dealing with mechanical devices doesn't necessarily follow when you're dealing with the types of failures that are common in computers.

One reason is that while mechanical devices might be assumed to fail randomly, typically bugs don't occur at random; they're triggered by a particular scenario and that will mean that for some individuals or for some locations you might see lots of different bugs but somewhere else there'll be very few.

Andd this looks from the perspective of someone who's relying on the evidence that these individuals are maybe acting incompetently or perhaps even criminally.

But actually this could just be down to the way that the bugs get triggered and how they manifest themselves.

And a second way that computers are different from mechanical devices is that failures can be induced for malicious purposes.

If someone knows that a bug is present they can trigger the scenario in which this bug will exhibit itself even if that would be very unlikely or perhaps even impossible to happen by chance.

And so you might have very common occurrences of failures which if you just assume that bugs would happen at random you could assume that this would never happen, so bugs can cause failures.

These failures can be very significant in terms of the reliability of evidence, but it can often be unclear whether a particular bit of evidence that is in dispute was correct or whether it is incorrect and whether the failure was due to human error or due to dishonesty

And distinguishing the different possibilities can take a huge amount of effort.

It's a sort of thing that you might see in major disaster investigations like a plane crash, but it's not something that you would typically see in disputes and when a dispute is over a relatively small amount of money it might not be proportionate to investigate that software in enough detail to establish with high degree of confidence whether a bug affected that particular bit of evidence that's being relied upon.

However, the good news is that software engineering as a field has developed practices and processes for how to develop reliable software.

Some of these are are standardized; some of these have been developed over time; but all of these have the common feature that documentation will be generated as the software is developed and tested and deployed.

These sorts of documentation will include records of bugs when the software has been identified to have failed; what are the effects of these bugs, and how they were responded to; there's also going to be tests that are performed and the documentation should list these tests and how they have been acted upon.

And there should also be processes in place to make sure that software doesn't get modified without proper controls, and there should be evidence that the software has not been tampered with.

Those documentation should be available; they should be readily available to be produced in the case of disputes and because they already exist the cost of doing so should be quite manageable.

And from this, a reasonable assessment can be made as to whether the software development process has been properly managed.

If the documents are absent or show that there are the types of failures that could explain the matter in dispute, then that shows that some more detailed investigation is going to be justified.

So together, this documentation can give a good idea as to the general reliability of the software and be very informative when computer evidence comes up in court.

But while general reliability is very helpful, it would still be good to have specific guidance on whether a particular bit of evidence is reliable or whether it is aligning with one side of the dispute.

So it would be better if we could, in a cost effective way, answer those sorts of questions.

The reasons why it's hard to do that today is often the requirement to produce evidence wasn't thought about when software was being developed, and the sorts of facilities that are used for producing evidence were not designed for that purpose.

They might have been designed for helping software developers understand what's going on, or understanding the reliability of the software in general, but it's not directly for producing evidence.

And the second reason is complexity: software is incredibly complex and any bit of the software can interact with any other bit of the software, and bugs that come from these interactions can be very hard to identify.

So there's ongoing research in computer science to develop techniques to first of all design-in the requirement to produce evidence when the software is created when it's still possible to do so, and so the facilities for producing evidence work as intended

And secondly limiting the complexity of the software through limiting the way that different bits of software can interact with each other, and so reducing complexity and making it easier to reason about the software.

But while this research is going to produce results, it's going to be a long time before we see this software out in the real world, so the legal profession will need to find ways in the meantime to deal with the imperfect complex software that is out there, and find ways to properly reach fair judgments in a cost- effective way that will still be in the interests of justice.