Description
Defining templates of galaxy spectra is useful to quickly characterise new obervations and organise data bases from surveys. These templates are usually built from a pre-defined classification based on other criteria. We present an unsupervised classification of 702248 spectra of galaxies and quasars with redshifts smaller than 0.25 that were retrieved from the Sloan Digital Sky Survey (SDSS) database, release 7. The spectra were first corrected for the redshift, then wavelet-filtered to reduce the noise, and finally binned to obtain about 1437 wavelengths per spectrum. Fisher-EM, an unsupervised clustering discriminative latent mixture model algorithm, was applied on these corrected spectra, considering the full set as well as several subsets of 100000 and 300000 spectra. The optimum number of classes given by a penalised likelihood criterion is 86 classes, the 37 most populated ones gathering 99% of the sample. These classes are established from a subset of 302214 spectra. Using several cross-validation techniques we find that this classification is in agreement with the results obtained on the other subsets with an average misclassification error of about 15\%. The large number of very small classes tends to increase this error rate. In this paper, we make a first quick comparison of our classes with the templates of Kennicutt (1992), Dobos et al (2012), Wang et al (2018). This is the first time that an automatic, objective and robust unsupervised classification is established on such a large amount of spectra of galaxies. The mean spectra of the classes can be used as templates for a large majority of galaxies in our Universe.
|