In this era of Pharma 4.0, and with the help of the latest available technology stacks, everyone is trying to implement the machine learning and AI models on their historical process data. The surge and rapid adoption of digitalization technologies, has enabled the industry to think across in multiple ways and dimensions but has also stipulated us to look back into our digitalization strategies and approaches which eventually dictate the outcome.
This rolling technological gush has put across its own demand, which sometimes now is becoming a major obstacle for many Industries to derive insights from their own longing data. It has also demanded for disruptive changes in the org-structure and change in the existing management, replacing, restoring and colonizing the groups which is conversant with the new technologies. But then somewhere, this change, which was too aggressive, has forced us to think again – Whether or not, have we missed something? Maybe domain know-how, physics?
All of these fancy ML/AI models (python packages) has made our lives so easy, that even a rookie with the minimal knowledge of the technology can apply these on the data without much requirement of the domain. But then, we miss-out too much on the domain part. To combat this and to ascertain the predictions from our ML/AI models enabling the implementation team with much accurate predictions, comes to our rescue – the “Hybrid Models”. These models are a simple python scripts like any other python-based ML/AI models, but with the additional flexibility of adding the domain know-how on top of it.
When it comes to the Biopharma industry, having the inferences constrained within the specifics of physics is of the utmost. Physics ensures that the predictions and prescriptions are well with the extent of variability for highly constrained environments as of those observed in pharma and life-sciences companies. Let us dive little deeper to get an understanding on how physics-based models can help in reliable and high-fidelity predictions.
Physics-Informed Machine Learning Models
In totality, Physics Informed ML/AI models are a combination of ML/AI models, physics-based models (Set of ODEs/PDEs) and a suitable optimizer. The concept arrives from our assumed fact that ML models can learn the behavior/patterns and trends from the historical data, whereas, the physics models bring in the domain limitation and, lastly the optimizer extends the capability to bridge the gap between them. Please refer the image below for a detailed understanding.
Heat Diffusion Equation
Let us try to understand this with a very simple example;
The system considered is a linear 1D metal rod of length unit meter. The distribution of the temperature over the period of time at any given time could be easily identified using the
physics-based equation for the 1D unsteady-state heat balance.
Following are set of physics-based governing equations which would act as the constraints for our data-driven model:-
Temperature is a function of temporal and spatial domains: T(t, x)
The unsteady state heat balance would be:
To solve this equation we would require 1 initial condition and 2 boundary conditions, as mentioned below;
T(0, x) = 25 C
T(t, 0) = 100 C
dT(t, L)/dt = 0
Note: This analysis could be scaled for a very complex system and geometry in 2D as well as 3D. For ease of understanding the concept we are using 1D example. The number of additional dimensions including time is equivalent to the number of features to be used in solving the ML/AI problem.
At this point, you might be thinking that we can easily solve this equation using any open-source ODE solvers. Yes, you’re right, but just imagine a situation where you are dealing with a large set of coupled ODEs/PDEs, and most of the constants in the equations are unknown. Yeah, now that has become a hefty task? You may demand for literature survey, and expert know-how to estimate the values of those unknown parameters. Moreover, there is one more challenge. Although, in this case, you have the well-established physics, but still, this is an approximation to the real-life scenario. In reality, there will always be a deviation between the result from the ODE and actual data.
So, now let us assume that we have the actual historical data of the rod Temperature available for 10 equidistant locations on it, along with the time interval of every 1 min.
The data would look something like this:
The challenge with this rod is that, we do not know how the temperature would behave over the period of time in the length dimension. Also, we do not know the material property “alpha” for the rod. Can we use historical data to estimate this? This looks like a curve fitting task in excel, but it has a limitation of solving linear, exponential, with some more additional behavior. The real-life problems/dynamics are highly non-linear /complex to solve it using excel.
Snapshots of Hybrid model based on Linear Regression using Seeq Datalab
To combat this challenge we will use “physics-based ML/AI” to estimate the value of alpha and to predict the temperature of the rod at any given time and location. This model will enable the engineers to look beyond the CFD models, which although is very useful, but becomes challenging at times when one has to deal with complex geometry and meshing or has to simulate it for longer time which increases the computational task. One more additional benefit of the physics-based ML/AI model is that it requires less volume of data compared to the traditional data-driven models. Below is the snippet of physics-based ML/AI model, which can be used as an approach to solve the complex systems where sufficient data is not available, or where the learnings from the data also demand for learning the physics for enhanced, reliable and reconciled accuracy.
Principal Engineer – Analytics