Feed on
Posts
Comments

Quantifying risk

Balancing risk is a difficult discipline. The standard definition of risk is that:

Risk = Probability of an accident  * impact of  the accident.

A discussion of risk often starts with this definition, then goes on to show that the definition is wrong or meaningless. When an uncertain but very small number is multiplied by an equally uncertain but very big one, the product could be anything in between. It’s a formula that can be used to argue both that billions should be spent on stopping asteroids from colliding with the earth and that the issue should be ignored altogether.

More down-to-earth events might be even harder, when you try to determine the probability or quantify the “impact” of something. This was examplified recently.  A high-profile project in Norway has been  a test center for  carbon capture at Mongstad, which aims to reduce the CO2 emissions from the industry.  The technology for this might be amine based, which prompted an article in Teknisk Ukeblad on the  toxic emissions such a carbon capture plant might produce. The emission of amines are said to present a risk of cancer to people living nearby the plant, or be toxic in other ways. (I know nothing of amine chemistry and can’t voucher for this)

In a comment The  Norwegian Pollution Control Authority said: (my translation)

- As always, we will try to reduce the risk as much as possible. CO2-emissions also have consequences and this will have to be considered in a complete evaluation. We try to reduce the environmental risk as far as possible.

How do you even start weighing these risks against each other? Should the cost of global warming be divided by the contribution to it from the mongstad plant? What is the cost of cancer and how do you factor in the uncertainty if we don’t know but only suspect that the amines are dangerous? Is the future benefit of carbon capture technology to be added into the equation? It’s no wonder risk assessment and management is a discipline in its own right today. Though I suspect they ditch the formulas long before they get to a case as complicated as this one.  :-)

Yahoo wrote up their key scientific challenges recently. It’s interesting to compare  their machine learning challenges with those facing data mining and machine learning efforts in oil well drilling.  The list was made by John Langford, which comment that the challenges are general enough to have applications outside Yahoo. And indeed, three of the  five challenges mirror challenges in analysing drilling time series:

  • The problem of nonstationary data is the most obvious and is shared by most real-world time-series. The multivariate time series changes abruptly when the driller switches between tasks or “drilling modes”.  A given flowrate could indicate business as usual in one mode and spell disaster in another. Therefore, alarm systems and other analysis don’t get far without somehow recognizing these modes.
  • The second of Langford’s challenges relates to label complexity and the lack of labeled data. In a drilling time series, the labels are sparse and coarse-grained. They point out a few unusual events and summarize stretches of routine operation orders of magnitude longer than the mode changes mentioned above. Manual labeling with the help of an expert is possible but time-consuming and expensive. Among the solutions, Langford mentions semi-supervised methods, an avenue of exploration shared with intrusion detection in computer networks, a domain similar to drilling in that it’s only feasible to manually label a tiny part of the data.
  • The last challenge has been named Exploration.  As Langford puts it: “You can’t rewind a user and try a different action, so you only get feedback for the chosen action” .  As with nonstationarity, this is  typical for real-world data and it’s perhaps not surprising that drilling faces a similar challenge. The high cost of drilling a well and the consequences of making a mistake, means that some sequences of actions or choices of parameters are never found in real-world data.

So the challenges appear very general. But will the solutions be as generally useful as the problems? In my PhD I’ve found that examples from biology, proteomics and intrusion detection helps me to understand where the stumbling blocks in drilling time series analysis lies. (Perhaps a topic for later posts?). But I’ve also tried applying methods from a similar problem, which fails due to seemingly small differences in the problem statements.

In any case, the last two challenges are problems I don’t seem to have come across in any of the introductory texts to machine learning I’ve read. The subjects are easily explained so have I read the wrong books or have they been blind spots in the curriculum until recently?

This video shows a nice lab-experiment with drilling induced vibrations in the riser. It looks fairly convincing though it only explores the effect of the rotating fluid between the drillstring and the riser, not transverse vibration set up in the drillstring itself. Unless the springs in the setup are meant to simulate that?

I’d suppose they’ve published a paper on this too, but no luck tracking it down.

« Newer Posts