Abstract
To solve the problem that high-redshift and broad emission lines weaken the quasar discovery and observation severely, a new redshift calculation method based on piecewise Gaussian fitting is proposed. The denoised and normalized spectrum is divided into two regions, peak and non-peak, by mean square error threshold segmentation first. Then, the non-peak region spectrum is applied to fit the continuous spectrum, removal of which gains access to the residual spectrum. And, the peak of each segment in the residual spectrum is precisely fitted by single-peak Gaussian fitting to replace the original multi-peak Gaussian fitting. Finally, through matching the accurate peak value with the stationary template, the redshift value is acquired. Compared with traditional methods, the method proposed improves the precision of continuous spectrum fitting and redshift calculation. The effectiveness and accuracy of this method have been verified by experiments based on the Sloan Digital Sky Survey data.
Introduction
Quasars are distant objects with very high redshift, which is caused by the space expansion between the quasars and the earth. More than 200,000 quasars are mainly observed by the Sloan Digital Sky Survey (SDSS). The observed quasars spectra are with redshifts ranging from 0.056 to 7.085.1,2 And, the astronomical data volume at different wavebands grows dramatically with the continuous sky surveying research by the large space-based and ground-based telescopes, such as SDSS, Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST), FIRST, and Two Degree Field (2dF) Redshift Survey. The existing and forthcoming astronomical database volume is too large to apply traditional analysis technics. 3 In the next decade, the ongoing project FAST (Five-hundred-meter Aperture Spherical radio Telescope) will face this severe challenge inevitably. Redshift is the most important physical parameter of quasar, which can be characterized by the relative difference between the observed and static wavelengths (or frequency) of an object. 4 The redshift can be calculated as
where
At present, the research works on the quasars’ redshift identification are mainly concerned on the template matching method. The earliest and most classic algorithm of template matching is the cross-correlation method proposed by Tonry and Davis. 5 Glazebrook improved Tonry’s method by replacing the individual templates by simultaneous linear orthogonal templates. 6 This improvement eliminates the mismatch between templates and data effectively and provides a better error estimation.5–7 However, the PCAZ can only be applied to measure tiny redshifts due to the wavelength range restriction of the orthogonal templates. And, these methods usually require high integrity of template combination.
Recently, machine learning algorithms have been applied to find quasars in astronomy. Neural network, K-means, K-neighborhood, Gaussian modeling, and many other methods have improved the redshift calculation accuracy and commutable range effectively.8–13 But, the identification methods based on spectral are very sensitive to the accuracy of characteristic line. However, the quasar has wide and less emission line, which results in the difficulty of extracting characteristic lines.14,15 This article is aimed at improving the original automatic peaks identification and obtaining accurate characteristic lines.1–3 And, the piecewise Gaussian fitting (PGF) divides the spectrum into different peaks and none-peaks areas. This method could be adapted to provide robust automatic redshifts for broad emission lines and large galaxy redshift. For the SDSS survey, there was a substantial improvement in the reliability of assigned redshifts and in the lowering of redshift uncertainties.
In the following sections, the spectral data used in this article and the spectral pretreatment is described in section “Data preparation.” In section “Characteristic lines extraction based on PGF,” the mean square deviation is used to classify the peak region where the characteristic line is located. This can not only avoid the broad spectrum problem when fitting the continuum but also identify each peak parameters one by one. In section “Redshift calculations and simulation verification,” PGF is used for computing the peak parameters and redshift values. Simulation results of the algorithm will be shown in section “Redshift calculations and simulation verification.” And, section “Conclusion” gives a conclusion of our research.
Data preparation
All the spectra in our experiment are from SDSS. SDSS is a major multi-filter imaging and spectroscopic redshift survey using a dedicated 2.5-m wide-angle optical telescope at Apache Point Observatory in New Mexico, USA. The survey will map in detail one-quarter of the entire sky with five broadband filters, determining the positions and absolute brightness of more than 100 million celestial objects. Data collection began in 2000, and the final imaging data release covers over 35% of the sky, with photometric observations of around 500 million objects and spectra for more than 3 million objects. The main galaxy sample has a median redshift of z = 0.1; there are redshifts for luminous red galaxies as far as z = 0.7 and for quasars as far as z = 5; and the imaging survey has been involved in the detection of quasars beyond a redshift z = 6. The spectra contain wavelengths covering the range 4000–9000 Å. The peak search method is sensitive to the noise level for noise peaks which are dominant in most low signal-to-noise ratio (SNR) cases. So, we selected stars with an average SNR > 5. And, the spectrum pre-processing is necessary in advance.
Spectral denoising
Consider the noise of the spectrum is similar to the white noise, so select the median filter to denoise it. Median filtering is a commonly used nonlinear smoothing filter, whose basic principle is that each point value of the spectral sequence is replaced by the mean value of all points in the sliding window. We use the median filter method to extract the continuous spectrum of the quasar spectrum. Sliding window size of 60 nm is selected after tons of experiments.16,17 The filtering effect is shown in Figure 1.

Spectrum (a) before and (b) after being filtered.
Continuous spectrum fitting
The extraction of the characteristic lines must take into account the influence of the continuum first. Since the presence of the continuum makes the true intensity of the spectrum line be obscured and cannot be accurately obtained, the continuum must be removed. There are many articles using filter technology to fit the continuum, but the quasars are broad emission galaxies, and it is often difficult to obtain the ideal fitting effect by the filter algorithm.
Aiming at the problem of the continuum fitting of quasars broad emission, the method of RMS (root mean square) error comparison is used to divide the spectrum into quasi-peak region and non-peak region. The regions greater than 3δ (

Peak region division.

Continuous spectrum.
The areas between each of the two asterisks in Figure 2 are the region where the broad emissions are located and are also the characteristic lines are located. As we know, these regions are difficult to be filtered out, which is a problem in the quasar spectral pre-processing. In fact, our purpose is to obtain continuous spectrum, so accurate peak location is not necessary temporarily. Therefore, the algorithm can only use the RMS error to obtain its approximate position and remove it.
The residual spectrum or the emission spectrum is obtained by subtracting the continuum from the denoised spectrum, as shown in Figure 4.

Residual spectrum.
Characteristic lines extraction based on PGF
The Gaussian function is a normal distribution function. In the application, many of the SED (spectral energy distribution) patterns can be described by the Gaussian curve. Although the Gaussian curve is a nonlinear function, but its parameters have reasonable physical meaning. The method has some advantages in simplified calculation, quick computer programming, and fast dissemination.18,19
There are a lot of methods for the peak position determination, which is also the characteristic spectrum wavelength acquisition. These include the derivative method, the Lorentz curve fitting method, and multi-peak simultaneous fitting method. However, one question that comes up frequently is the lower accuracy. The main reason is that we often calculate multiple peaks at the same time, because under the influence of noise and sky light, it is easy to get false peaks and the wrong peak positions. For redshift calculations, this inevitably generates errors or even erroneous results. In view of this, PGF is put forward using the results of variance segmentation. Each individual peak region is a single Gaussian distribution. The individual fitting in each region avoids the identification error introduced by multi-peaks and multi-parameters. Finally, the peak parameters such as the peak positions and wave width are getting more accurately, and redshifts can be calculated based on these. The Gaussian function can be expressed as
where c is the emission wavelength and H is spectral line relative intensity; c, H, and
Assuming that there is a set of data
Suppose
Equation (3) matrix form is expressed as
abbreviated as
Using the least squares principle, the generalized least squares solution of the matrix B is
The estimated parameters

Fitting results of each peak: (a) peak 1, (b) peak 2, and (c) peak 3.
Each peak parameters which is obtained by Gauss fitting.
Redshift calculations and simulation verification
Redshift calculation
After obtaining the characteristic spectrum, the red shift is calculated as follows:
Step 1. Do spectral pre-processing, denoise the spectrum by median filtering, and normalize the denoised spectrum to obtain the filtered spectrum
Step 2. Calculate the RMS error
Step 3. Continuous spectrum removal operation. Using the result of step 2, the continuous spectrum is obtained by fitting the segmented region. And, subtract the continuous spectrum from the original spectrum to obtain the residual or the emission spectrum.
Step 4. Carry out Gaussian fitting on each peak region to obtain the exact peak value, which is also the emission line set
Step 5. Refer to the static template in Lewis and Ibata
6
for redshift calculation. Suppose the template spectral characteristic wavelength set is
Step 6. According to the calculated spectral characteristic line value, compared with the laboratory standard spectral line table, find the redshift and confirm the line.
The central wavelength of the spectral line can be obtained from the emission line set
Eight most distinct emission lines of the composite quasar.
Wavelength of Hβ is from Table 2 of Berk et al. 20

The rest quasar spectrum template.
Using the information provided above, we can find the redshift value through the following process:
Sort the n spectrum characteristic lines from small to large according to the center wavelength and store them in the array
Calculate the candidate redshift for each characteristic line. First, assume
For the ith remaining line in
Calculate the weight sum of the
Simulation
In this section, we will present the performance of our PGF in terms of peak recognition and redshift calculation. All the data in our experiment are from Sloan Digital Sky Survey (SDSS). These data with the wavelength range from 4000 to 9000 Å. Figure 7 shows detailed information of the spectra SNR distribution in experiment. We can see that the low SNR will affect the extraction result.

Signal to noise ratio distribution of the data set.
First, select 10 well SNR spectra, and their identification, peaks area, and redshift are shown in Figure 8. It can be seen that the ideal peak regions are obtained, and the redshift error values between the actual value and the predicted value are all within 0.02.

Comparison results of the computing redshift and the segmentation and peaks of the first 10 spectral data.
The accuracy test is shown in Figure 9, the abscissa is the results obtained by the calculation, and the ordinate is the results provided by SDSS. The correct points of prediction are on the line of slope one, so we can see that the accuracy of redshift calculation is still quite satisfactory.

Comparison of calculation results.
In order to compare the performance of redshift estimation by our PGF algorithm with that by the support vector regression (SVR) and backpropagation (BP) neural network, the 15 selected spectral redshift estimation values with these methods are shown in Table 3. The estimation
Comparison of other methods and our integrated approach.
SVR: support vector regression; BP: backpropagation; PGF: piecewise Gaussian fitting
Conclusion
In this article, a new method to calculate the redshift is proposed based on the previous achievements. This method effectively overcomes the problems that the continuous spectrum cannot be fitted and the peaks cannot be obtained accurately. The root of these problems is the existence of the quasar broad emission line. Different from the previous method, PGF method obtains the peak area through threshold segmentation first and then obtains the characteristic spectral line using the gauss fitting in the peak area. The results show that this method is superior to the original method, which obtains the characteristic line at one time. With the progress of observation technology, more and more quasars will be observed, and our method can provide effective identification and calculation redshift strategy.
Footnotes
Handling Editor: Marcin Wozniak
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Science Foundation of Shandong Province (grant nos 2016ZRE2703, ZR2017PD010, and ZR2017PA004), the National Natural Science Foundation of China (grant no. 11803017), and the China Postdoctoral Science Foundation (grant no. 2016M600538).
