Sage Journals: Discover world-class research

Abstract

Accurate prediction of spatial distribution of petroleum resources is important for petroleum exploration. Mahalanobis distance is a popular and effective method to predict the spatial distribution of oil and gas. However, this method has equal weights for each variable, exaggerates secondary variables, and is prone to misjudge when the distance is close or equal, which impairs the accuracy of classification. To solve these problems, this paper proposes a novel Mahalanobis distance method based on genetic algorithm (GA-MD) to optimize attribute weights. The Sangonghe Formation in the hinterland of the Junggar Basin was used as an example, the validity of GA-MD was evaluated in the exploratory well data set. Compared with the current mainstream methods, the results show that the accuracy of GA-MD method is the highest, and the accuracy is improved by 2–6.2%. The application effect of the proposed method is verified by the prediction result of oil and gas probability map, the GA-MD method not only shows higher oil and gas bearing probability in the reserve areas but also has better trend extrapolation ability compared with other methods. Based on the GA-MD results, the favorable zones with remaining petroleum resources in the Sangonghe Formation in the hinterland of the Junggar Basin were visualized. Three types of favorable oil and gas distribution areas are selected. The favorable areas provide a basis for quantitative decision-making for the optimization of the next drilling strategy and determination of oil and gas exploration deployment direction in the study area.

Keywords

Petroleum resources spatial distribution prediction Mahalanobis distance genetic algorithm Sangonghe Formation

Introduction

Petroleum exploration is mainly conducted to find economically recoverable reserves, but it is a high-risk high-cost investment activity with many complexities and uncertainties (Hamzeh and Karimipour, 2020b). From 1980 to 2016, China National Petroleum Corporation(CNPC) drilled 40730 exploratory wells in prospective areas, among which there were only 17371 wells with economic oil and gas flow, accounting for 42.6% (Guo et al., 2019). This indicates that the drilling success rate can be further improved. Predicting the spatial distribution of petroleum resources can visualize exploration risks, thereby facilitating decision making for a series of exploration activities, such as target ranking, drilling planning, and resources management, and benefits exploration and development. Thus, an effective prediction algorithm is crucial for petroleum exploration.

There has been great interest in the prediction of spatial oil and gas distribution in the past decades. This has been realized using three primary methods types: knowledge-driven, data-driven, and hybrid methods. The knowledge-driven method is based on the knowledge and experience of an expert, and the main geological elements required for oil and gas accumulation in the region are comprehensively studied. The relative importance of input data sets and model parameters are evaluated by the expert, and the oil and gas bearing property of the study area are evaluated through calculation. For example: overlay method (White, 1988, 1993; Otis and Schneidermann, 1997; Guo et al., 2022a), fuzzy integration method (Chen et al., 2002), grey relational analysis method (Sheng et al., 2020). This type of method is suitable for areas in early and medium stages of exploration and with relatively simple geologic structures, and are usually affected by artificial factors, thus being subjective. The data-driven method mainly uses data mining and artificial intelligence and other methods, according to the exploration wells and geological data in the research area where oil and gas have been found. The relationship between oil and gas potential and geological factors are analyzed, then the extracted relationship is used to estimate the model parameters, the input data set is integrated into the model and the spatial changes of oil and gas distribution are quantitatively calculated, to establish the exploration risk. For example: mahalanobis distance(MD) method (Hu et al., 2007, 2009; Xie et al., 2011), support vector machine method (Chen et al., 2012), evidential belief functions model (Amiri et al., 2015a, 2015b), logistic regression method (Zhu et al., 2018), Bayesian network method (Ren et al., 2020; Guo et al., 2022b). The data-driven method may yield relatively objective results and have been widely used in the areas at medium stage of exploration, but they require a lot of data. The hybrid method combines the merits of the knowledge- and data-driven methods to predict the spatial oil and gas distribution in the target reservoirs. For example: optimized fuzzy ELECTRE method (Hamzeh and Karimipour, 2020a), optimized fuzzy preference ranking organization method (Hamzeh and Karimipour, 2020a).

MD has been widely applied in the prediction of spatial oil and gas distribution, but it places equal weights to variables, exaggerates the effects of secondary variables, and misjudges in the context of close or equal distances. Therefore, the objective of this paper is to propose an efficient algorithm for the above MD problems, and to predict the spatial distribution of oil and gas resources, so as to provide decision-making basis for oil and gas exploration. The main innovations of this paper are: (1) The Mahalanobis distance method based on genetic algorithm is proposed for the first time. (2) The effectiveness and superiority of this method are illustrated by an application example. (3) The application of this method to the spatial distribution of oil and gas in Sangonghe Formation points out the direction for future oil and gas exploration.

This paper presents the methodology and workflow of the MD method in section 2, a case study involving geological setting, key parameters, experimental design and results, and predicted prospects with promising resources in section 3, and conclusions in section 4.

Methodology descriptions

Approach

Drilling and seismic data provide the primary information for venture exploration and prospect prediction. Completed exploratory wells can be used as a set, and the wells are classified into two categories; wells with economic oil and gas flow (productive wells) and wells without economic oil and gas flow (non-productive wells). The subset consisting of productive wells is titled the population of productive wells, denoted as $G_{p}$ . The subset consisting of non-productive wells is entitled the population of non-productive wells, denoted as $G_{n p}$ . These two populations correspond to two data matrixes, $X_{p}$ and $X_{n p}$ , which consist of the observations of geological features and attributes related to these two populations. The geological features and geophysical attributes X at the sites to be drilled may be correlated with known features of populations to quantitatively predict hydrocarbon accumulation before well drilling. Correlation based on similarity is essentially a problem of classification. By investigating the distribution of these factors and attributes, a scientifically appropriate statistical analysis method is used to objectively discriminate the types of exploratory wells to be drilled (productive wells or non-productive wells) and thus to calculate their economic oil and gas flow probability.

Workflow of MD

The MD method has been widely used in petroleum exploration, especially in identifying oil and gas wells, predicting the spatial distribution of petroleum resources, and drilling site optimization. The workflow of MD integration of geological variables combined with the Bayesian statistical method for oil and gas prediction is shown in Figure 1. The major steps are as follows.

Data collection

Collect all available exploratory well and geoscience data in the study area and classify exploratory wells into two categories; productive and non-productive wells, denoted as $G_{p}$ and $G_{n p}$ , respectively.

Discriminant parameter determination

Establish geologic factors which dominate hydrocarbon accumulation in the study area based on drilling, seismic, and basin modeling data combined with expert knowledge, and establish the system of discriminant parameters.

Data preprocessing

Data preprocessing focuses on checking parameter effectiveness and establishing data sets.

Plotting of MD-based petroleum probability chart

$X$ is the exploratory well data set composed of m observations and n variables. $X_{p}$ and $X_{n p}$ are the productive and non-productive well datasets, respectively, and $X_{p}$ + $X_{n p}$ = $X$ . $X_{p}$ and $X_{n p}$ contain $r$ and $q$ observations, respectively, i.e., $r + q = m$ .

Figure 1.

Workflow of spatial oil and gas distribution prediction by the MD method.

The MD method calculates the distance between the observation $x_{i}$ of attribute $X_{i}$ at the sample point to be predicted and known attributes of two populations ( $X_{p}$ and $X_{n p}$ ) and then categorizes the sample point in accordance with the distance. It is assumed that $x_{i} = {x_{i 1}, x_{i 2}, \dots, x_{i n}}$ is derived from the population $G_{p}$ , and the MD distance between $x_{i}$ and $X_{p}$ is defined as follows

M D_{i}^{p} = \sqrt{(x_{i} - u_{p})^{T} S_{p}^{- 1} (x_{i} - u_{p})} .

(1)

where,

u_{p} = {u_{1}, u_{2}, \dots, u_{n}}

is the mean vector of the productive well data set

X_{p}

;

S_{p}

is the covariance matrix of the data set

X_{p}

. A MD set with

r

samples, i.e.,

{M D}^{p} = {M D_{1}^{p}, M D_{2}^{p}, \dots, M D_{r}^{p}}

, can be obtained by calculating the MD between all the observations in

G_{p}

and

X_{p}

. In accordance with

{M D}^{p}

, the probability density function,

f_{p} (X)

, can be calculated to represent the MD of productive wells. Similarly, the probability density function,

f_{n p} (X)

, can be obtained to indicate the MD of non-productive wells.

According to the Bayes formula, the conditional probability of the sample $x = {x_{1}, x_{2}, \dots, x_{n}}$ coming from $G_{p}$ is calculated using the following equation:

P (G_{p}, x) = \frac{q_{p} f_{p} (X)}{q_{p} f_{p} (X) + q_{np} f_{np} (X)} .

(2)

where,

q_{p}

and

q_{n p}

are the prior probabilities of productive and non-productive wells, respectively. The petroleum probability template for different MDs (Figure 2) was established based on the conditional petroleum probability calculated using Eq. (2) at all drilled sites in the exploratory well data set

X

. This template is the base to predict the petroleum probability.

5. Petroleum probability calculation and risk visualization

Figure 2.

MD risk probability template.

Based on the template in Figure 2, the MDs at the sites to be predicted in the grid data set are converted into the petroleum probability using the interpolation method. The spatial distribution is generated using the Kriging interpolation and then visualized to delineate the spatial distribution of petroleum resources (risks).

GA-MD

As a key feature of MD, an inverse matrix of the covariance matrix is introduced into the equation of distance calculation, to yield dimensionless variables and alleviate the influence of correlation between variables based on distance. However, the MD method is defective since it exaggerates the variables with small changes and may misjudge in the context of close or equal distance. We developed a novel MD method based on genetic algorithm for variable weight optimization to effectively improve the accuracy of the algorithm. Eq. (1) is converted into the following equation by adding the weights:

M D_{i}^{p} = \sqrt{(x_{i} - u_{p})^{T} [W^{T} S_{p}^{- 1} W] (x_{i} - u_{p})},

(3)

W = [\begin{matrix} W_{1} & 0 \\ W_{2} \\ \dots \\ 0 & W_{n} \end{matrix}] .

(4)

In Eq. (4),

W

is a weighted diagonal matrix and can be solved to obtain the MD in Eq. (3). How to solve for

W

is an open question. Our solution is a genetic algorithm which uses the cooperation of many individuals in the population dataset to solve

W

. A number of tests indicated the weight

W_{i} \in [0.5, 2]

. Figure 3 shows the workflow of GA-MD. The workflow mainly includes four steps: (a) initialize the parameters including the population

P

, maximum iterations

M a x I t

, and current generation

i t = 0

; (b) calculate the degree of fitness (accuracy) for each individual in the population; (c) create the next generation of the population through selection, crossover, and mutation; and (d) it=it+1, and judge if it is equal to MaxIt. If the answer is yes, output the optimum weight

W_{b e s t}

of the current generation. If the answer is no, return to step (b).

Figure 3.

The flow chart of optimizing the weight of MD method based on GA.

Application example

Geological setting of the study area

The case study dealt with the Junggar Basin, which is a large petroliferous basin of $13 \times 10^{4} k m^{2}$ in northwest China. The latest great progress in petroleum exploration includes a large sandy conglomerate oil field discovered in the Mahu sag, northwestern Junggar Basin, a shale oil field in the Jimsar sag, southeastern Junggar Basin, and a high-yield deep oil field in southwestern Junggar Basin. The study area, 160 km in the east-west direction and 170 km in the north-south direction, covers $2.7 \times 10^{4} {km}^{2}$ in the central basin. The area includes eastern Mahu sag, P1 West sag, Dongdaohaizi sag, Sannan sag, Dabasong uplift, Mosuowan uplift, Mobei uplift, and Luxi slope. The target reservoir is the Jurassic Sangonghe Formation $(J_{1 s})$ , with the source rocks in the Permian Lower Wuerhe Formation $(P_{2 w})$ and the mud shale caprocks in the Jurassic area. The study area is structurally low in the south and high in the north, and lithofacies is fine in the south and coarse in the north. Hydrocarbons migrated upward and from south to north to form fault nose, fault block, and lithologic-stratigraphic reservoirs. Well Qianshao-2 drilled by CNPC in 2019 at the west side of the Mobei bulge yielded daily oil of $39.3$ $m^{3}$ and daily gas of $20.36 \times 10^{4} m^{3}$ from the Sangonghe Formation, which exhibited a prospect of exploration in the Sangonghe Formation, central Junggar Basin. By the end of 2019, there were more than 203 exploratory and appraisal wells penetrating the Sangonghe Formation with oil testing completed, and 3P(proven, probable, and possible) reserves of $1.128 \times 10^{8}$ t oil and $459 \times 10^{8} m^{3}$ gas, were discovered in the Mosuowan and Mobei bulges, Shixi, Shi’nan, and Xiayan. Figure 4 shows the location of the study area and the distribution of major oil and gas reservoirs.

Figure 4.

Location of study area and distribution of oil and gas reservoirs.

Key parameters and validity check

Based on the data of the 203 exploratory wells penetrating the Sangonghe Formation, and the petrophysical data, reserves, sedimentary facies maps, seismic structure maps, fault maps, trap maps, and basin modeling results of 54 reservoirs, four geologic factors were extracted to predict the spatial oil and gas distribution in the study area.

(a) Source rock (or hydrocarbon supply) conditions

These conditions indicate the probability of hydrocarbon supply, which is defined to be from 0 (probability of 0) to 1 (probability of 100%). Probability distribution is determined in terms of hydrocarbon-generating intensity map, the map of faults connecting the underlying source rocks, and modeled oil and gas migration pathways(Guo et al., 2018). Table 1 is the evaluation of the interval value of hydrocarbon supply conditions.

Table 1.

Evaluation value of hydrocarbon supply conditions.

Level	Trap	Minimum	Maximum
I	Connecting hydrocarbon source fault + main pathway	0.7	0.8
II	main pathway	0.4	0.6
III	Secondary pathway	0.3	0.5
IV	Southern near-source area	0.15	0.25
V	Northern far-source area	0.1	0.2

(b) Reservoir rock conditions

These conditions indicate the probability of reservoir rock occurrence, which is established based on Sangonghe sedimentary facies (fluvial-delta facies) and petrophysical data of oil and gas reservoirs. Table 2 is the evaluation of the interval value of reservoir rock conditions.

Table 2.

Reservoir evaluation value.

Level	Microfacies	Minimum	Maximum
I	Underwater distributary channel	0.7	0.9
II	Sheet sand	0.6	0.8
III	Beach-bar and Sandy debris flow	0.5	0.7
IV	Interdistributary bay	0.3	0.5
V	shallow lacustrine	0	0.3

These conditions indicate the probability of trap occurrence, which is set to be 100% for the area with discovered oil and gas reservoirs, 70–80% for confirmed traps, 60–70% for the traps to be confirmed, 30–60% for prognostic lithologic traps, and < 30% for additional areas. Table 3 is the evaluation of the interval value of trap conditions.

Table 3.

Trap evaluation value.

Level	Trap	Minimum	Maximum
I	Discovered hydrocarbon reservior	1	1
II	Confirmed trap (structural type)	0.7	0.8
III	Un-confirmed trap (structural type)	0.6	0.7
IV	Lithologic lens (Beach bar, etc)	0.5	0.6
V	Lithologic barrier (Interdistributary bay, etc)	0.3	0.5

(d) Caprock & preservation conditions

These conditions indicate the probability of hydrocarbon sealing and preservation, which is determined depending on overlying faults and unconformable weathered zones. The probability is set to be 10–20% in the area where overlying faults are unfavorable for hydrocarbon preservation, 70–90% in the area where unconformable weathered clays function as good seals for hydrocarbons, and 40–60% in the area where mud shales serve as local seals. Table 4 is the evaluation of the interval value of caprock & preservation conditions.

Table 4.

Caprock & preservation evaluation value.

Level	Caprock & preservation	Minimum	Maximum
I	Unconformable weathered clays	0.7	0.9
II	Mud shales	0.4	0.6
III	Overlying faults	0.1	0.2

The target interval is the Sangonghe Formation. We used the data of the 203 exploratory wells penetrating the Sangonghe Formation, including 109 productive and 94 non-productive wells (Figure 4). A labelled exploratory well dataset with 203 samples was established based on the quantitative assessment of hydrocarbon supply, reservoir conditions, trap conditions, and caprock & preservation conditions at each well site (Table 5).

Table 5.

Data set of 203 exploratory wells.

Number	Well name	Quantitative evaluation value of geologic parameter				Well type
Number	Well name	HSC	RC	TC	CAPC	Well type
1	DD5	0.75	0.87	0.67	0.14	0
2	Fan003	0.79	0.12	1.00	0.60	1
3	Fan004	0.73	0.12	1.00	0.40	1
4	Fan2	0.74	0.51	0.51	0.76	1
5	Fan3	0.80	0.38	0.01	0.13	0
6	He8	0.16	0.32	1.00	0.57	1
7	Ji003	0.49	0.47	1.00	0.50	1
8	Ji008	0.46	0.83	0.36	0.45	0
9	Lu001	0.73	0.80	0.05	0.76	0
10	Lu002	0.73	1.00	0.68	0.88	1
…	…	…	…	…	…	…
203	Dong1	0.2	0.13	0.07	0.41	0

HSC—–Evaluation value of hydrocarbon supply conditions; RC—– Evaluation value of reservoir rock conditions; TC—–Evaluation value of trap conditions; CAPC—– Evaluation value of caprock & preservation conditions; 1 —–productive well; 0—–non-productive well;

Exploratory well types can be discriminated using valid parameters. Figure 5 shows the variations of four key geologic factors of well types. Non-productive and productive wells show different patterns of the probability distribution of each factor. To quantitatively demonstrate the validity of key factors, we used the Kolmogorov-Smirnov (K-S) test to validate the correlation between the four geological variables (source rock conditions, reservoir rock conditions, trap conditions, and caprock & preservation conditions) and class variables (exploratory well type). According to the null hypothesis of the K-S test, two groups of samples of an attribute have equal distribution at the given class variable conditions, i.e., $c d f (X | c = 0) = c d f (X | c = 1)$ . Table 6 shows the results of the K-S test for four variables and class labels. The p values of the four geological variables were evaluated to be < 0.05, indicating that the null hypothesis was rejected at the 5% significance level. In other words, the four geological attributes show significant differences in value distribution between productive and non-productive wells. Thus, extracted geological attributes are demonstrated to be valid for oil and gas prediction.

Figure 5.

Distribution of the key parameters ((a) Evaluation value of hydrocarbon supply conditions; (b) Evaluation value of reservoir rock conditions;(c) Evaluation value of trap conditions;(d) Evaluation value of caprock & preservation conditions.)

Table 6.

K-S test results.

	HSC	RC	TC	CAPC
p value	3.85E-07	2.40E-07	5.21E-20	0.003934

In addition to the exploratory well data set, we used the two-dimensional Perpendicular Bisection grid technique to establish 15951 evaluation units in the study area. Each unit was acquired with four accumulation factors to form the grid data set, which was mainly applied to the prediction of spatial oil and gas distribution.

Experimental design and result analysis

Comparison method

To demonstrate the validity, GA-MD was compared with five popular methods for the prediction of spatial oil and gas distribution.

MD: Original Mahalanobis distance method (Xie et al., 2011).

TAN: Tree-augmented naive Bayes is a Bayesian network classifier(BNC) method, which establishes the maximum weighted spanning tree between geological attributes to characterize the relationship between the different attributes (Ren et al., 2020).

AODE: Average one-dependence estimator is an integration method that improves the accuracy of AODE by integrating multiple BNC models (Guo et al., 2022b).

LR: Logistic regression method is a generalized linear regression analysis model that establishes the discriminant relationship between geological attribute variables and class variables (Zhu et al., 2018).

SVM: Support vector machine method is a generalized linear classifier for binary classification of data, and its decision boundary is a maximum-margin hyperplane that solves for the learning samples (Chen et al., 2012).

Data sets and evaluation criterion

Data sets used are shown in Table 5. Accuracy is a common criterion to evaluate the classification performance and is defined as the ratio of the number of correctly classified samples to the number of all samples. Generally, the higher the accuracy, the better the classifier (evaluation method). The accuracy of a known evaluation method $G$ for a test set $T = {t_{1}, t_{2}, \dots, t_{m}}$ with m examples, where $t_{i} = {x_{1}^{i}, x_{2}^{i}, \dots, x_{n}^{i}}$ , is defined as:

A c c (T ∣ G) = \sum_{i = 1}^{m} ξ (c_{i}, \hat{c_{i}}) / m .

(5)

where,

c_{i}

is the true class label of the i-th test example;

\hat{c_{i}}

is the class label of the i-th test example predicted using method

G

; and

ξ (c_{i}, \hat{c_{i}})

is the binary function. For

c_{i} = \hat{c_{i}}

ξ (c_{i}, \hat{c_{i}}) = 1

; otherwise,

ξ (c_{i}, \hat{c_{i}}) = 0

Results and discussion

(I) Validation of the proposed method

We compared GA-MD with MD, TAN, AODE, LR, and SVM. For GA-MD, the parameters were set to be the population P=20 and the maximum iterations MaxIt=5. The parameters of other algorithms were set in accordance with the corresponding references. In each test, 90% of samples were randomly extracted as the training set from the data set with 203 exploratory wells, and 10% of samples were used as the test set. Each algorithm was run 10 times independently, and the average of 10 results was taken as the accuracy of the corresponding method.

Table 7 lists the accuracy of the GA-MD method and five comparison methods using the test set. As shown in Table 7, the accuracy of GA-MD was the highest, reaching 86.67%, and the accuracy of AODE, TAN, MD, SVM, and LR were 85%, 84%, 82.8%, 82.7% and 81.63%, respectively. The analysis shows that the six methods are roughly divided into three categories: (1) MD, SVM and LR methods, with accuracy of 82.8%, 82.7% and 81.63% respectively, the accuracy is not low, but is relatively lower when compared with the other methods. The main reason is that these methods belong to conventional statistical models. When there is a large amount of data to be analyzed, they have shortcomings when compared with the intelligent statistical AODE and TAN methods based on big data research. (2) The accuracy of AODE and TAN methods is 85% and 84% respectively, which is relatively high. This is because the data set has a large and balanced sample size. If the data size is small, the prediction result is not necessarily better than the MD, SVM and LR methods. (3) GA-MD method, which is based on MD but is further optimized, has the advantages of high accuracy. High accuracy can be achieved without pursuing operational efficiency. The above analysis not only shows that the GA-MD method can solve the oil and gas distribution prediction problem, but also has improved accuracy compared to the current popular methods.

Table 7.

Accuracies of different methods.

Method	SVM	Logistic	AODE	MD	TAN	GA-MD
Accuracy	81.63%	82.7%	85%	82.8%	84%	86.67%

(II) Validation of the probability map of oil and gas spatial distribution

The GA-MD model was trained using the training set with 203 exploratory wells and then applied to the calculation of the posterior petroleum probability for the samples in the grid data set. The contour map of hydrocarbon probability was made through interpolation, in which the units with the probability above 50% may have hydrocarbon accumulations. At the same time, we also used MD, TAN and AODE methods to predict the hydrocarbon spatial distribution.

Figure 6(a) and (d) show the oil and gas probability results predicted by MD, TAN, AODE and GA-MD, respectively. The map uses different colors to characterize the level of possibility of oil and gas in any site in the whole region. It can be found from the Figure 6 that the probability of hydrocarbon occurrence predicted by the four methods is the highest in the discovered reserves area (the blue curve), indicating that the predicted results of the four methods are highly consistent with the actual exploration results. A good model not only has a good prediction effect in the known reserves area, but also has a certain trend extrapolation ability, that is, the ability to predict the undiscovered oil and gas areas. It can be seen from Figure 6 that although the four methods have better prediction results in the discovered oilfields, the four methods have different distribution patterns outside the reserves. The analysis shows that the oil and gas probability distribution patterns in Figure 6(a) and (b) and (d) are obviously different. Figure. 6(a) and (b) shows more large areas with high probability of hydrocarbon distribution. The existence of these high probability areas does not conform to the laws of geology in terms of distribution form, and obviously exceeds the existing geological knowledge in the scope of distribution area. The distribution patterns of hydrocarbons in Figure. 6(c) and (d) are similar, and the prediction results are better, but there are still some differences. The distribution of hydrocarbon in the study area is controlled by many different factors. It can be seen from Table 6 that trap conditions are the most important, followed by reservoir and hydrocarbon supply conditions, and finally cap rock conditions. Through comprehensive analysis of the factors that control the distribution of hydrocarbon and the prediction results, we find that the areas with high hydrocarbon occurrence probability in Figure 6(d) are basically located in areas with higher hydrocarbon supply evaluation values and cap evaluation values, while some areas with high hydrocarbon occurrence probability in Figure 6(c) are located in the areas with lower hydrocarbon supply evaluation value and cap evaluation value, which indicates that GA-MD is more sensitive to hydrocarbon supply information and cap information than AODE. Based on the above analysis, it is considered that the prediction results of Figure 6(d) are relatively objective, and the probability map shape is more in line with the current exploration knowledge and geological laws.

Figure 6.

Prediction results of oil and gas spatial distribution in the Sangonghe Formation in the hinterland of the Junggar Basin(a: MD; b: TAN; c: AODE; d: GA-MD).

(III) Prediction of favorable areas for spatial distribution of oil and gas

Figure 6(d) shows that inside the areas with booked reserves, drilled productive wells occur in and close to the red zones with high hydrocarbon probability, and drilled non-productive wells occur in the zones (white color) with low oil probability. This means that the prediction agrees with drilling results. There are some prospects with the probability above 50% outside the areas with booked reserves, which could be classified into three types (Figure 7).

Type-I prospects extend in the perimeter of the areas with booked reserves, as plotted in Figure 7 by red solid lines in the mid-south and central parts.

Type-II prospects are undrilled new areas, as plotted in Figure 7 by red dash-dot lines in the southeast and northwest.

Type-III prospects are the areas with geologic complexities, where there were some hydrocarbon discoveries but no booked reserves, as plotted in Figure 7 by red dotted lines in the middle and northeast.

Figure 7.

Sangonghe hydrocarbon probability map derived from GA-MD.

Type-I prospects predicted with high confidence are the focus of extension and rolling exploration. Type-II prospects predicted with high uncertainties are the focus of venture exploration activities. Type-III prospects with geological complexities require further investigation before progressing with drilling deployment and booking reserves.

Conclusions

This paper presents a novel MD method based on genetic algorithm for variable weight optimization and was applied to the prediction of spatial oil and gas distribution. The conclusions are as follows:

A case study of the Sangonghe Formation in the Junggar Basin shows the accuracy of GA-MD was 86.67%, which was higher than other five popular methods (MD, TAN, AODE, LR, and SVM). This demonstrates the validity and advantages of GA-MD. Compared with other methods, GA-MD method also shows its superiority and accuracy in probability map prediction.

The accuracy of GA-MD is higher than that of MD in theory and practical application, which plays an important role in improving economic benefits of oil and gas exploration and production with huge investment.

Through geological analysis and K-S verification of the training set, it is found that the traps in the study area play an important role, and it is suggested to give priority to finding traps in the next exploration.

According to the prediction results of GA-MD for oil and gas accumulation in Sangonghe reservoir, three types of prospects were predicted outside the areas with booked reserves. Type-I prospects predicted with high confidence are the focus of extension and rolling exploration. Type-II prospects predicted with high uncertainty are the focus of venture exploration activities. Type-III prospects with geological complexities that require further investigation before progressing with drilling deployment and booking reserves. GA-MD prediction will facilitate exploration risk reduction in the study area and improved decision making for drilling optimization.

Although the GA-MD method proposed in this paper is superior to the MD method, it still has the following limitations: (1) There are many parameters that are not easy to determine; (2) In finite steps, the genetic algorithm may not be able to establish the global optimal solution; (3) The time complexity is high. Therefore, our future work will focus on the above limitations to optimize the GA-MD method and improve its performance. In addition, the application effect of GA-MD in the research area in this paper is better, but the application effect in other research fields needs further research.

Footnotes

Acknowledgments

We thank the editors and anonymous reviewers for their valuable comments on this paper.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funds for this paper come from China National Petroleum Corporation in major S&T projects, including Research on Key Technologies of Fine Exploration in Mature Exploration Areas (No. 2021DJ07), Evaluation Methods/Parameters and Potential of Shale Oil Resources (2019E-2601), and Shale Oil Exploration and Development Technologies (No. 2021DJ18).

ORCID iD

Hongjia Ren

References

Amiri

Karimi

Abbas

(2015a) Hydrocarbon resources potential mapping using evidential belief functions and frequency ratio approaches, southeastern Saskatchewan, Canada. Canadian Journal of Earth Sciences 52: 182–195.

Amiri

Karimi

Sarab

(2015b) Hydrocarbon resources potential mapping using the evidential belief functions and GIS, Ahvaz/Khuzestan province, southwest Iran. Arabian Journal of Geosciences: 8: 3929–3941.

Chen

Liu

Osadetz

(2012) Geological risk evaluation using the Support Vector Machine with examples from the late Triassic-early Jurassic structural play in western Sverdrup Basin, Canadian Arctic Archipelago. Bulletin of Canadian Petroleum Geology 60: 142–157.

Chen

Osadetz

Embry

, et al. (2002) Hydrocarbon favourability mapping using fuzzy integration: Western Sverdrup Basin, Canada. Bulletin of Canadian Petroleum Geology 50: 492–506.

Guo

Liu

Chen

, et al. (2018) Mesh model building and migration and accumulation simulation of 3D hydrocarbon carrier system. Petroleum Exploration and Development 45: 1009–1022.

Guo

Ren

, et al. (2022a) A fractal simulation method for simulating the resource abundance of oil and gas and its application. Mathematical Geosciences 54: 873–901.

Guo

Ren

, et al. (2022b) A method of predicting oil and gas resource spatial distribution based on bayesian network and its application. Journal of Petroleum Science and Engineering 208: 109267.

Guo

Yan

, et al. (2019) An assessment method for deep gas resources. Acta Petrolei Sinica 40: 383–394.

Hamzeh

Karimipour

(2020a) An ABC-optimized fuzzy ELECTRE approach for assessing petroleum potential at the petroleum system level. Open Geosciences 12: 580–597.

10.

Hamzeh

Karimipour

(2020b) Petroleum potential assessment using an optimized fuzzy outranking approach: A case study of the Red River petroleum system, Williston Basin. Energy Exploration & Exploitation 38: 960–988.

11.

Guo

Chen

, et al. (2007) A method of predicting petroleum resource spatial distribution and its application. Petroleum Exploration and Development 34: 113–117.

12.

Hu S, Guo Q, Chen Z, et al. (2009) Probability mapping of petroleum occurrence with a multivariate-bayesian approach for risk reduction in exploration, Nanpu Sag of Bohay Bay Basin, China. Geologos 15: 91–102.

13.

Otis

Schneidermann

(1997) A process for evaluating exploration prospects. AAPG Bulletin 81: 1087–1109.

14.

Ren

Wang

Guo

, et al. (2020) Spatial prediction of oil and gas distribution using Tree Augmented Bayesian network. Computers & Geosciences 142: 104518.

15.

Sheng

Sun

Bai

, et al. (2020) Evaluation of hydrocarbon potential using fuzzy AHP-based grey relational analysis: A case study in the Laoshan Uplift, South Yellow Sea, China. Journal of Geophysics and Engineering 17: 189–202.

16.

White

(1988) Oil and gas play maps in exploration and assessment: GEOLOGIC NOTE. AAPG Bulletin 72: 944–949.

17.

White

(1993) Geologic risking guide for prospects and plays. AAPG Bulletin 77: 2048–2061.

18.

Xie

Guo

, et al. (2011) Prediction of petroleum exploration risk and subterranean spatial distribution of hydrocarbon accumulations. Petroleum Science 8: 17–23.

19.

Zhu

Lin

Zhang

, et al. (2018) Evaluation of geological risk and hydrocarbon favorability using logistic regression model with case study. Marine and Petroleum Geology 92: 65–77.

A novel Mahalanobis distance method for predicting oil and gas resource spatial distribution

Abstract

Keywords

Introduction

Methodology descriptions

Approach

Workflow of MD

GA-MD

Application example

Geological setting of the study area

Key parameters and validity check

Experimental design and result analysis

Comparison method

Data sets and evaluation criterion

Results and discussion

Conclusions

Footnotes

Acknowledgments

Declaration of conflicting interests

Funding

ORCID iD

References