Star Formation in Galaxies#
Chris Richardson, Elon University
Funding for this tutorial made possible through the CATL Scholars Program
This tutorial is intended for students to complete in groups. A version with solutions is available to instructors upon request.
Objectives#
To explore scaling relationships among star forming galaxies in SDSS
To understand how SDSS galaxy images and spectra can help us understand star formation rate (SFR) metrics
To make a connection between the current SFR of a galaxy and its star formation history
To provide a foundation to explore your own questions about star formation in galaxies
Learning Outcomes#
By the end of this tutorial you should be able to:
Use H\(\alpha\) to find the star formation rate in SDSS galaxies
Use the star forming main sequence to identify starburst galaxies and quiescent galaxies
Use SDSS image cutouts to explain bias in SFR metrics
Use D4000 as a proxy for star formation history
Prerequisites#
Computational
Basic Python skills (e.g., for loops, conditional statements, defining functions)
Basics of
matplotlib
visualization, SQL queries, and SciServer platform (e.g., see Britt Lundgren’s notebook which inspired the format for this notebook)Using the
astropy
package for fundamental constants and unit conversion
Astrophysics
Introductory knowledge of galaxy classification
Introductory understanding of stellar lifetimes and the HR diagram
Advanced conceptual understanding of applying the Case B approximation to counting photons
# Import relevant libraries
import pandas as pd # data analysis
import numpy as np # more data analysis
from matplotlib import pyplot as plt # another plotting library
from matplotlib.image import imread
from astropy.constants import c # useful constants
from astropy import units as u # useful for unit conversions
from astropy import coordinates as coords
from astropy.coordinates import SkyCoord
from astropy.io import fits
from astropy.io import ascii
from astroquery.sdss import SDSS
from astroquery.simbad import Simbad
# Python standard-libraries to download data from the web
from urllib.parse import urlencode
from urllib.request import urlretrieve
#only here to display images
from IPython.display import Image
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 9
6 from matplotlib import pyplot as plt # another plotting library
7 from matplotlib.image import imread
----> 9 from astropy.constants import c # useful constants
10 from astropy import units as u # useful for unit conversions
11 from astropy import coordinates as coords
ModuleNotFoundError: No module named 'astropy'
# Make notebook have a clean appearance
pd.set_option('display.max_colwidth', None)
import warnings
warnings.filterwarnings('ignore')
Choosing a Sample of Galaxies#
Several SDSS schema will be used in this tutorial, each with unique data products helpful to investigate star formation. The SpecObj
schema contains basic information about the observation and the object, such as the redshift \(z\). The galaxy
schema provides useful photometric measurements for colors and images. The three GalSpec
schema each give products derived from spectroscopy. As you should know, the H\(\alpha\) emission line is a convienient indicator of the star formation rate. We want galaxies with quality measurements of this line strength, so a S/N restriction is specified. We also want the H\(\beta\) emission line, because as we will see below, this will assist us in accounting for dust absorption along our line of sight.
By using H\(\alpha\) as indicator of star formation, how does this limit your analysis?
How do you expect highly star forming galaxies to look?
Your answers to these questions:
# Create a query to fetch star forming galaxies
query = 'SELECT TOP 10000 s.plate, s.fiberid, s.mjd, s.ra, s.dec, s.z, s.zwarning, '
query += 'g.h_beta_flux, g.h_alpha_flux, '
query += 'g.h_beta_flux_err, g.h_alpha_flux_err, '
query += 'e.bptclass, e.lgm_tot_p50, '
query += 'i.lick_hd_a_sub, i.d4000_n_sub, '
query += 'i.lick_hd_a_sub_err, i.d4000_n_sub_err, '
query += 'p.petror90_r '
query += 'FROM GalSpecLine AS g, SpecObj AS s, GalSpecExtra AS e, GalSpecIndx AS i, galaxy AS p '
query += 'WHERE '
query += 's.specobjid = g.specobjid '
query += 'AND s.specobjid = e.specobjid '
query += 'AND s.specobjid = i.specobjid '
query += 'AND s.bestobjid = p.objid '
query += 'AND bptclass = 1 '
query += 'AND s.z BETWEEN 0.04 and 0.1 '#-- Lower limit needed for global parameters w/ small aperture
query += 'AND h_alpha_flux > h_alpha_flux_err*5.0 ' #-- Make sure Balmer lines have adequate S/N
query += 'AND h_beta_flux > h_beta_flux_err*5.0 '
query += 'AND d4000_n_sub > d4000_n_sub_err*5.0 '
query += 'AND h_alpha_flux_err > 0 ' #-- Consistency check
query += 'AND h_beta_flux_err > 0 '
query += 'AND h_alpha_flux > 2.7*h_beta_flux ' #-- Ensures proper continuum subtraction
query += 'AND lgm_tot_p50 > 0 '
query += 'AND sigma_balmer < 509.55 ' #-- Exclude the broad line emitting galaxies
query += 'AND s.class = \'GALAXY\' ' #-- Looking for galaxies, not stars
query += 'AND s.zwarning = 0 '
#print (query) # useful for debugging, remove first # to uncomment
# send query to SDSS
gals = SDSS.query_sql(query,data_release=16)
gals
plate | fiberid | mjd | ra | dec | z | zwarning | h_beta_flux | h_alpha_flux | h_beta_flux_err | h_alpha_flux_err | bptclass | lgm_tot_p50 | lick_hd_a_sub | d4000_n_sub | lick_hd_a_sub_err | d4000_n_sub_err | petror90_r | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 329 | 6 | 52056 | 177.62908 | -3.326442 | 0.078031 | 0 | 45.879730 | 273.12190 | 3.291739 | 4.542729 | 1 | 10.661870 | 3.828772 | 1.453698 | 1.001100 | 0.028119 | 6.909836 |
1 | 326 | 147 | 52375 | 172.07771 | -2.878846 | 0.095268 | 0 | 257.521400 | 785.83600 | 4.412366 | 7.641911 | 1 | 9.859731 | 7.313367 | 1.139154 | 1.037312 | 0.020886 | 6.646070 |
2 | 335 | 190 | 52000 | 189.26213 | -2.982288 | 0.088555 | 0 | 54.240750 | 203.45360 | 2.292731 | 3.075485 | 1 | 9.830482 | 6.429802 | 1.246805 | 1.304015 | 0.028645 | 4.525309 |
3 | 337 | 27 | 51997 | 194.14905 | -2.919338 | 0.052390 | 0 | 59.298840 | 234.12340 | 2.860254 | 3.917636 | 1 | 10.147940 | 4.229728 | 1.323284 | 1.003340 | 0.024767 | 6.746436 |
4 | 330 | 316 | 52370 | 176.92980 | -2.563193 | 0.076802 | 0 | 112.538500 | 510.88790 | 3.626529 | 6.436479 | 1 | 10.251690 | 4.821437 | 1.284835 | 1.115302 | 0.026129 | 4.442664 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9995 | 1719 | 586 | 53876 | 229.20211 | 9.636919 | 0.085698 | 0 | 41.080730 | 162.25520 | 2.143597 | 2.654202 | 1 | 9.889796 | 4.391088 | 1.267068 | 1.423546 | 0.032088 | 6.319391 |
9996 | 1798 | 499 | 53851 | 199.25717 | 9.936388 | 0.052416 | 0 | 27.837730 | 209.10180 | 1.961622 | 3.422518 | 1 | 10.730490 | 0.963644 | 1.536464 | 1.222649 | 0.036792 | 7.227845 |
9997 | 1806 | 586 | 53559 | 208.88008 | 9.501891 | 0.070048 | 0 | 9.401968 | 40.19081 | 1.542465 | 1.394677 | 1 | 9.787393 | 1.583487 | 1.501869 | 1.816120 | 0.050408 | 10.510790 |
9998 | 1809 | 616 | 53792 | 212.79641 | 9.220711 | 0.079386 | 0 | 41.077510 | 190.89150 | 1.806039 | 2.547210 | 1 | 10.244090 | 5.054030 | 1.380123 | 1.052414 | 0.027612 | 4.488715 |
9999 | 1810 | 450 | 53794 | 213.27146 | 9.215585 | 0.062881 | 0 | 77.005220 | 273.67080 | 1.896502 | 2.989808 | 1 | 9.121378 | 3.194716 | 1.223580 | 1.290285 | 0.026648 | 2.047930 |
10000 rows × 18 columns
Dereddening the Spectrum#
In order to determine the star formation rate (SFR) in each galaxy, we need to get the H\(\alpha\) luminosity \(L\)(H\(\alpha\)) emitted from each galaxy. The SQL query gives us the observed flux \(F\)(H\(\alpha\)), but this doesn’t account for dust absorbing some of this light as it travels from a galaxy to the point of observation. Fortunately, we can correct for this “extinction” by making use of the fact that dust preferentially absorbs shorter wavelength light to “deredden” the spectrum. In ideal conditions without dust, we can assume that \(F\)(H\(\alpha\))/\(F\)(H\(\beta\)) = 2.86 in star forming regions. Let’s deredden the H\(\alpha\) flux.
# Deredden emission lines according to Calzetii+ 2000
# wavelength supplied in angstroms
def deredden(lum,wl,Hb,Ha):
if (wl >= 6300 and wl <= 22000):
wl = wl/1e4 # convert to microns
kwl = 2.659*(-1.857+1.040/wl)+4.05
elif (wl >= 1200 and wl < 6300):
wl = wl/1e4 # convert to microns
kwl = 2.659*(-2.156+1.509/wl-0.198/wl**2+0.011/wl**3)+4.05
else:
print ('Wavelength of range')
# Using k-values calc as above
kHa = 3.326
kHb = 4.598
ebmv = 2.5/(kHb-kHa)*np.log10(Ha/Hb/2.86) # 2.86 for SF; 3.1 for AGN
lum_dr = lum*10**(0.4*kwl*ebmv)
return lum_dr
####
Halpha_flux = gals['h_alpha_flux']
Hbeta_flux = gals['h_beta_flux']
Halpha_flux_dr = deredden(Halpha_flux,6563,Hbeta_flux,Halpha_flux)
Determining the Star Formation Rate#
According to the GalSpecLine
schema, the line flux is returned as 10\(^{-17}\) erg/s/cm\(^2\), so we first need to multiply our dereddened \(F\)(H\(\alpha\)) by 10\(^{-17}\) to get actual flux value. Luminosity has units of energy / second, and an erg is a unit of energy, so we’re almost there. Use the space provided below to convert \(F\)(H\(\alpha\)) to \(L\)(H\(\alpha\)) and name that variable Halpha_lum_dr
.
How can you get the distance to the galaxy in cm?
How does distance factor into converting flux to luminosity?
# your code for calculating Halpha_lum_dr
# Halpha_lum_dr =
We can now use the \(L\)(H\(\alpha\)) to determine the SFR in units of M\(_{\odot}\)/yr using the conversion from Kennicutt+ 1998.
Halpha_lum_dr = Halpha_lum_dr.value
sfr = np.log10(Halpha_lum_dr*7.9e-42)
gals['sfr'] = sfr
plt.hist(sfr,bins=20)
plt.xlim(-2.5,2.0)
plt.xlabel(r'SFR [M$_{\odot}$/yr]')
plt.ylabel('N')
plt.show()
NameErrorTraceback (most recent call last)
<ipython-input-8-5708a5e908ea> in <module>
----> 1 sfr = np.log10(Halpha_lum_dr*7.9e-42)
2
3 gals['sfr'] = sfr
4
5 plt.hist(sfr,bins=20)
NameError: name 'Halpha_lum_dr' is not defined
The Star Forming Main Sequence#
A common measure of galaxy mass is its stellar mass M\(_*\) in units of M\(_{\odot}\), however a galaxy’s mass can be defined in many ways. We can roughly envision that M\(_*\) is proxy for galaxy size since larger galaxies typically contain more stars.
Would you expect the number of star froming regions in a galaxy to increase or decrease as M\(_*\) increases?
How would expect the SFR to correlate with M\(_*\)? Explain.
Your answers to these questions:
#Mstar = gals['lgm_tot_p50'] #uncomment this line
mstar_array = np.linspace(8.0,12.0,100)
poly1d_fn = np.poly1d([1.192,-12.31])
plt.xlabel(r'M$_*$ [M$_{\odot}$]')
plt.ylabel(r'SFR [M$_{\odot}$/yr]')
plt.xlim(8.0,12.0)
plt.ylim(-2.5,1.5)
plt.scatter(Mstar,sfr,s=0.5)
plt.plot(mstar_array, poly1d_fn(mstar_array), 'k--',label='MS')
plt.legend(loc='lower right')
NameErrorTraceback (most recent call last)
<ipython-input-9-e4a2d5b20d2e> in <module>
10 plt.ylim(-2.5,1.5)
11
---> 12 plt.scatter(Mstar,sfr,s=0.5)
13 plt.plot(mstar_array, poly1d_fn(mstar_array), 'k--',label='MS')
14
NameError: name 'Mstar' is not defined
We can see that star forming galaxies lie on what is known as the star forming main sequence, which is a shout out to the stellar main sequence on the HR diagram. The SF main sequence is based on the correlation between SFR and M\(_*\) where “typical” star forming galaxies lie.
In terms of appearance, how would you expect SF galaxies with high SFR to compare to those with low SFR?
Would you expect them to be spirals or elliptical?
Blue in color or red in color?
Brighter or dimmer?
Your answer to this question:
Visualizing the Sample with Image Cutouts#
Once you have provided your answer above, let’s actually have a look at the images of galaxies that have high SFR and low SFR. We can create function to gather up the images for us. The function takes in an filtered array of galaxies to isolate those of interest and description of the filtering procedure to display at run time. The first block of code will display the high SFR rate galaxies and then next block will display low SFR galaxies.
#
# Function to display images
#
def get_images(gal_array,description):
# set thumbnail parameters
width=200 # image width
height=200 # height
pixelsize=0.396 # image scale
plt.figure(figsize=(15, 15)) # display in a 4x4 grid
subPlotNum = 1
i = 0
new_gals = gal_array
nGalaxies = len(new_gals)
for index in range(0,nGalaxies): # iterate through rows in the DataFrame
i = i + 1
if i<17:
print('Getting image of '+description+str(i)+' of '+str(nGalaxies)+'...')
if (i == nGalaxies):
print('Plotting images...')
scale=2.0*new_gals['petror90_r']/pixelsize/width
pos = coords.SkyCoord(new_gals['ra'][index], new_gals['dec'][index], unit=u.deg, frame='icrs')
cutoutbaseurl = 'http://skyservice.pha.jhu.edu/DR12/ImgCutout/getjpeg.aspx'
query_string = urlencode(dict(ra=new_gals['ra'][index], dec=new_gals['dec'][index],width=width, height=height, scale=scale[i]))
url = cutoutbaseurl + '?' + query_string
# this downloads the image
image_name = 'cutout_'+str(index)+'.jpg'
urlretrieve(url, image_name)
# Display the image
img = imread(image_name)
plt.subplot(4,4,subPlotNum)
subPlotNum += 1
plt.imshow(img) # show images in grid
plt.title(index) # show the object identifier (objId) above the image.
######
filtered_gals = gals[(gals['sfr']> -0.05) & (gals['sfr']< 0.05)] # only galaxies with high SFR
get_images(filtered_gals,'high SFR galaxy ') # call function with filtered galaxy array
filtered_gals = gals[(gals['sfr']> -1.55) & (gals['sfr']< -1.45)] # only galaxies with high SFR
get_images(filtered_gals,'low SFR galaxy ')
How did these results compare to your answer above?
It might seems intuitive to think that a high SFR means that more O- and B-type stars, which means blue colors in disk-like structures. However, while it’s true that most of the images above are disk-like, there’s a wide assortment of colors. In fact, there’s a variety colors and disk-like structures in both samples, suggesting that they have something in common (i.e., the MS).
Much like stars can deviate from the MS on the HR diagram, galaxies can deviate from the MS on the SFR-M\(_*\) diagram. At a given M\(_*\), starburst galaxies have 10x the SFR of MS galaxies, while quiescent galaxies have a 0.01x the SFR of MS galaxies. Use these definitions to add two more lines to the SFR-M\(_*\) plot denoting the location of starbursts and quiescent galaxies.
# your code to add starburst and quiescent demarcations
Understanding the sSFR Metric#
It becomes evident from this plot that when we want to identify a highly star forming galaxy, the metric we’re usually after is the specific star formation rate sSFR. It is defined as
\(\mathrm{sSFR}\equiv\frac{\mathrm{SFR}}{\mathrm{M}_*}\)
and represents ratio of current star formation to past star formation, or the growth rate of stellar mass in a galaxy. We can add the sSFR to our dataframe to explore the metric in detail.
ssfr = np.log10(10**sfr/10**Mstar)
gals['sSFR'] = ssfr
plt.xlim(-12,-8)
plt.xlabel(r'sSFR [yr$^{-1}$]')
plt.ylabel('N')
plt.hist(gals['sSFR'],bins=25)
plt.show()
In the cell below, display the images for a sample of starburst galaxies and a sample of quiescent galaxies. Do these galaxies align with what you orignally envisioned high and low SFR galaxies to look like?
# your code to display images of starburst and quiescent
While the sSFR is measure of the relative star formation taking place in a galaxy, the sSFR values aren’t that intuitive. A simple way to gain more intuition is by taking the inverse of the sSFR, which gives us
\(\tau_d = \frac{\mathrm{M}_*}{\mathrm{SFR}}\)
where \(\tau_d\) is the amount of time needed to form the current stellar population or the mass-doubling timescale. We can infer the SFHs of galaxies by comparing \(\tau_d\) to the Hubble time (i.e., the age of the universe). In the space below, compare the mass-doubling timescales of the galaxies to the Hubble time.
# your code to calculate doubling times and compare to the Hubble time
Which galaxies formed most of their stars long ago and which galaxies are forming most of their stars now?
Where do these galaxies fall relative to the SF main sequence?
How do the masses of these galaxies compare?
Your answers to these questions:
Creating a Link to SFHs#
We can use \(\tau_d\) to infer the star formation history (SFH) of a galaxy, but underlying blackbody curve in the continuum of galaxies also tells us about their SFH. The “break” at 400 nm (4000 Angstroms) is typically written as D4000 and serves as a measure for the discontinuity of the blackbody curve. It is typically defined as,
\(D4000 = \frac{\int_{4000}^{4100}f_{\lambda} d\lambda}{\int_{3850}^{3950}f_{\lambda} d\lambda}\)
where the integration over wavelength is shown in Angstroms. Under this definition, D4000 = 1.0 would represent a flat underlying spectrum, also known as the continuum. If D4000 > 1.0 then there’s a discontinuity in the continuum. Let’s take a look at the spectrum around 4000 Angstroms for starburst galaxy and a quiescent galaxy.
starbursts = gals[gals['sSFR']>-9.5]
plate = starbursts['plate'][0]
mjd = starbursts['mjd'][0]
fiber = starbursts['fiberid'][0]
spec = SDSS.get_spectra(plate=plate,fiberID=fiber,mjd=mjd)
wl = 10**spec[0][1].data['loglam']
flux = spec[0][1].data['flux']
ax = plt.axes()
plt.xlim(3800,4300)
plt.ylim(0,30)
plt.xlabel('$\lambda$')
plt.ylabel('$F_{\lambda}$')
ax.plot(wl, flux, '-k', label='spectrum')
quiescent = gals[gals['sSFR']<-11.5]
plate = quiescent['plate'][0]
mjd = quiescent['mjd'][0]
fiber = quiescent['fiberid'][0]
spec = SDSS.get_spectra(plate=plate,fiberID=fiber,mjd=mjd)
wl = 10**spec[0][1].data['loglam']
flux = spec[0][1].data['flux']
ax = plt.axes()
plt.xlim(3800,4500)
plt.ylim(10,40)
plt.xlabel('$\lambda$')
plt.ylabel('$F_{\lambda}$')
ax.plot(wl, flux, '-k', label='spectrum')
Do these plots make sense to you given what you know about D4000, starburst galaxies, and quiescent galaxies? Explain.
In the space below, explain how you would expect D4000 to correlate with sSFR.
Write a snippet of code to verify your answer.
Your answer to this question:
# your plot showing the correlation between sSFR and D4000
Follow Up Questions#
In this Lecture-Tutorial, we developed some intuition about star formation in galaxies. There are several avenues for further exploration that you can dig into further.
Construct a color magnitude diagram of the galaxies in our sample
Where do the low mass and high mass galaxies lie?
Are there trends with MS / starbursting / quiescent galaxies?
How would you expect D4000 to correlate with g-r color?
The WISE catalog of mid-IR photometry can be used to calculate SFRs using the W3 and W4 bands. Create a new SQL query to cross match SDSS galaxies with WISE.
What does the literature say about calculating SFRs from W3 and W4?
How do the SFRs determined in the optical and IR compare?