Quantifying Activity Influence on a Scheduling Target

12 Dec

A common output request from stochastic scheduling is the identification of a "critical path". The critical path is naively understood to be the identifiable set of tasks that must occur as specified by the baseline schedule to reach a target. This approach is generally limited to schedules which do not have decision branches or alternative pathing available; a progression of tasks form a linear timeline for a piece of equipment. In reality, for open cut operations, things move around. The utlisation of fleet is incredibly important, and schedules can be constructed which allow for advancement off the priority path if said path is currently constrained. The notion of identifying a "critical path" is no longer valid. When comparing cases which have branched, the sets of tasks have changed. What would be deemed "critical" between cases could be contradictory.

The problem can be reframed to extract the statistically significant factors contributing to variability against a baseline target. These factors are also temporally spaced; a significant contributing factor in one period may not be significant in another. Rather than call these factors "critical", let us prefer to use the term "influential". In a statistical parlance, the factors are the contributing component explaining the variability in a target value.

So what are these factors? Well, it depends on the questions being asked of the schedule. As an example, a demonstration model presented below uses stochastic scheduling run multiple times to answer specific questions that are typically asked of a mine schedule.

Proof of Concept - Variability in a Single Machine

To prove up the statistical backing of the influence output, only a single machine has variability applied to it, the CAT6060, in this case. This machine does not work any of the Coal process, yet we will use CoalTonnes as the target. Running 1500 simulations yields the following temporal distributions against the baseline schedule. The important detail to note is that there is variability in the target despite there not being variability in the machine that works the coal. This is an example of dependent tasks being affected by variability upstream.

Ordinary Least Squares can be employed to gain an understanding as to the what is impacting the observed variability. To understand how a piece of equipment is impacting the variability in coal tonnes, the total observed coal tonnes can be plotted against a single machine's operating hours. Inspecting the scatter plot for a single quarterly period follows.

As expected, there is only variability in the CAT6060 hours. We want to test the hypothesis that the variability in these hours contributes to the variability in the coal tonnes. For this, we can construct a regression like so:

$$CoalTonnes = \beta_0 + \beta_1\cdot CAT6040_{ophrs} + \beta_2\cdot CAT6060_{ophrs} + \beta_3\cdot DozerFleet_{ophrs}$$

Solving the OLS we come up with the following statistics:

Explanatory Variable	Estimate	Std Error	t value
Alpha\CAT6040	12.560 K	8.759 M	0.001
Alpha\CAT6060	65.283	5.261	12.407
Alpha\Dozer Fleet	-31.064 K	23.193 M	-0.001

The t-values are what is used to test for significance of each explanatory variable. The t-value is simply computed as

$$t = {\beta_i\over{SE(\beta_i)}}$$

The standard error is more involved, requiring the variance-covariance matrix of the variables and scaling it by the root mean square of the residuals.

$$SE(\beta_i) = RMSR\sqrt{C_{ii}}$$

To test if an explanatory variable is contributing we set up a null hypothesis test that the coefficient is zero (that is, it does not contribute).

$$H_0: \beta_i = 0$$

Given a confidence level of α = 0.1, we reject the null hypothesis if the absolute t-value is greater than 1.64. Framed another way, an explanatory variable is statistically significant for |t-values| greater than 1.64. Thus for the example above, only the CAT6060likely influences the coal tonnes.

This aligns with our intuition from the scatter plot, but importantly, is backed by statistical rigour. This allows us to abstract the process and apply the same process against more than just equipment operating hours. If a question is being asked about the constraining "process", the same algorithm can be employed. This allows one to ask: "During period P, is process X an influential variable on target T?".

To further highlight this point, we can take a look at another period: 2024-03. The scatter plot looks like so, with the dozer fleet now having variability (this is derived from the variability in the prestrip).

It looks like the dozers' operating hours now may have some influence over the coal target, so to rigorously test this we again solve for OLS, repeating the process as before.

Explanatory Variable	Estimate	Std Error	t value
Alpha\CAT6040	-111.105	64.143 M	-1.732e-6
Alpha\CAT6060	85.217	17.091	4.985
Alpha\Dozer Fleet	952.472	22.497	42.337

This time, the null hypothesis is rejected for the CAT6060 and the Dozer Fleet, and both these variables are deemed as influential equipment.

Generalising the Method

To show how this method can be applied to stochastic schedules with mutliple variable inputs, the demonstration simulation was run again, this time varying the inputs of the CAT6060, CAT6040, and the Dozer Fleet. The periodic distribution of coal flow is presented below.

Rather than focus on individual periods, an aggregated view of the method is constructed. For each period, the method presented above is run, and the results can be plotted as a temporal bar chart of the influential variables. The plot is of the coefficients of the fitted regression, only including the variables that pass the significance test, along with the regression's adjusted R-squared value.

It is important to frame how to interpret this chart.

The bars represent the contribution to CoalTonnes for every 1 increase in OperatingHours,
The markers represent the percentage of variability in CoalTonnes explained by the equipment's OperatingHours.

This is why this metric is considered as an influence, it describes how much of the variability in a target is explained by the variability in the equipment. It can reveal where effort is best spent on an equipment's path, and the sensitivity of derived processes.

This notion is also general, the examples presented focus on an equipment's operating hours with ROM coal tonnes as the target. Similar questions can be asked with differing target metrics and different 'variable' metrics. For example, the method can be repeated with delay hours as the variable. This reveals how much influence different types of delays have on a specified target. This is especially important to understand when modelling discrete risk events such as weather or manning issues.

While statistically rigorous, the influence metric is not a panacea for engineering intuition. It assists in describing the variability in stochastic scheduling outputs, yet knowledge of the schedule can help describe the cause of influence. In the example above, the dozer fleet routinely has the largest influence over the coal target. This makes sense, since it is the main predecessor activity required to uncover the basal coal seam. What is not immediately clear is that the variability in prestrip required to unlock dozer push could be having an impact. The influence method would need to be undertaken again, instead asking the question: "During period P, what variables X are influencing the dozer push target T?".

With stochastic scheduling burgeoning into an applicable tool for schedule interrogation, it is important to develop the correct statistical tools and reporting methods that can adequately inform us. The method presented here is a statistically rigorous test to obtain the influential factors on a given target metric.

Kurt Lawrence

Quantifying Activity Influence on a Scheduling Target

Proof of Concept - Variability in a Single Machine

Generalising the Method

Category Theory with Rust (pt2)

Category Theory with Rust (pt1)

Kurt Lawrence - Mining & Software Consultant