Abstract
We estimate the returns (measured by hourly earnings) to education, experience, and social networking in India using individual-level panel data from the India Human Development Surveys. We combined the two latest waves of this survey using individual-level identifiers to generate a balanced panel and merged it with various household characteristics. We provide estimates of private returns for an additional year of education and experience by consumption quintiles, gender, caste, and religion in a fixed-effects Heckman model that controls for selection bias. This methodology improves upon estimates of all earlier studies on earnings in India, as most of the literature has relied on cross-section data or pseudo-panel data. We also examine the impact of social networking on earnings, which is under-explored in nationwide studies in India. We find that education significantly and positively affects earnings for all consumption quintiles, gender, caste (except schedule castes), and religious groups. Among economic groups, the highest returns are observed for the third quintile above the poverty line. Returns to females for an additional year of education are nearly double that of males but the difference in starting earnings keeps earnings of males higher for long periods. Among the castes, scheduled castes have the highest returns to education and other minorities among religious groups. Social networking positively impacts males, Hindus, and the quintile just above the poverty line. Experience positively impacts women’s earnings, general caste and scheduled caste, Hindus and Other minorities and two consumption quintiles (two and five) above the poverty line.
Plain Language Summary
Purpose: Enhancing human capital is critical for India’s development. It would help overcome existing labor market hierarchies based on economic class, gender, religion, and caste. We study the impact on private earnings of (a) an additional year of education and experience, and (b) social networking. Methods: We use an individual level panel dataset assembled from the two latest rounds of the Indian Human Development Survey (I&II). Our Heckman type earnings equation controls for selection bias. Results: An additional year of education increased earnings between (a) 2.4% and 8.8% among different consumption quintiles (b) 3.7% for males to 5.2% for females, (c) 1.5% (STs), to 5.9% (SCs), and (d) 2.3% (Muslims), to 9.9% (Other Minorities). Experience increased earnings of two economic groups, females, the General and Scheduled castes, and of Hindus and Other Minorities. Social networking increased earnings of males, Hindus, and one economic quintile. Conclusions: Higher marginal returns to education for females justifies greater investment in women’s education. Lower returns on education among STs and Muslims indicate the need for affirmative action for these groups. The positive returns to education for the poor suggests that anti-poverty programs in combination with educational opportunities for the less privileged would meet the goals of social justice. Implications Better education (SDG4) would help achieve gender equality (SDG5), and social justice for marginalized (economic, caste and religious) groups (SDG1, SDG10). Limitations: We were unable to account for ability bias. Stratification by state and sector would provide better estimates.
Keywords
Introduction
Do women and men get similar private returns to education and experience in India? Do caste, religion and social networking play a role in determining wages? If so, to what degree? These questions continue to hold center stage in household decision-making and development policy-making in countries such as India. They echo similar concerns in other parts of the world, including developed countries where identity issues—predominantly based on economic status, caste, race, gender, and religion—have been sites for discrimination.
In this paper, we re-examine the issue of private returns to education (earnings) in India. We use data from two rounds of the India Human Development Survey (IHDS) to construct a balanced individual-level panel data set, subsequently merged with household characteristics. The IHDS data set has been used in individual and household panel models (Akter & Chindarkar, 2020; Azam, 2018; E. Chatterjee & Sennott, 2021; Tamuly & Mukhopadhyay, 2022). However, to the best of our understanding, this is the first attempt to explain earnings in India with an individual panel data model, with individual-level fixed effects from a nationwide survey. We follow the Heckman selection model and use a new and improved command (xtheckmanfe) in Stata16.1 (Rios-Avila, 2021) to control for selection bias (Mundlak, 1978; Semykina & Wooldridge, 2010; Wooldridge, 2018).
We contribute to the literature by providing first estimates on the private returns to an additional year of education and experience by consumption quintiles, caste, gender, and religion. This improves upon Mitra’s (2019) method, which uses quintiles based on wages without separating the poor from the non-poor. Our analysis provides greater disaggregation, which can help in devising more effective and targeted policies for affirmative action. Previously, studies in India have employed a cross-section data set (Agrawal, 2014; Bairagya, 2020; Geetha Rani, 2014; Kingdon & Theopold, 2008; Mitra, 2019) or a quasi-panel (Bhattacharya & Sato, 2017; Mendiratta & Gupt, 2013) to study earnings in India. We believe our panel data approach improves upon estimates of all earlier studies on this subject. We also control for impacts of social networking which is less explored in wage determination models.
We found that education has a significant and positive effect on earnings for all consumption quintiles, genders, castes, and religious groups. Among economic groups, the highest returns are observed for the third quintile above the poverty line. Returns of additional years of education (marginal increase) to females are nearly double that of males. Among the castes, scheduled castes have the highest returns to education and among religious groups other minorities have the highest returns. Experience was found to be significant for two consumption quintiles above the poverty line, females, general castes and scheduled castes, Hindus and other religious Minorities. We also learned that social networking has an impact on the earnings of males, Hindus, and the quintile just above the poverty line. Our study also adds to the nascent literature on the impact of social networking on earnings in India.
Literature Review
Education is considered an investment in human capital which provides a pathway to overcome socio-economic and identity-based barriers. Accordingly, the human capital framework is often used to explain the earnings of individuals as a return on investment following the seminal contributions of Mincer (1958), Schultz (1961), and Becker (1962). An extensive empirical literature has emerged based on this framework (Acemoglu & Autor, 2012; N. Angrist et al., 2021; Beam et al., 2020; Becker, 1994; Collin & Weil, 2020; England & Folbre, 2023; Faggian et al., 2019; Goldin, 2016; H. A. Patrinos & Psacharopoulos, 2020). Conventional wage determination models use ordinary least squares (OLS) methods, but these have been demonstrated to suffer from selectivity and endogeneity bias (J. J. Heckman et al., 2006; Wooldridge, 2018). J. Heckman (1974) provides a mechanism to overcome the problem of selection bias in Mincer’s original formulation (Mincer, 1974). Apart from cross-country studies (H. A. Patrinos & Psacharopoulos, 2020; Peet et al., 2015), the Heckman selection model has since been used to estimate returns on education in different parts of the world such as the United States (J. D. Angrist & Krueger, 1991; J. Heckman & Polachek, 1974), Europe (Harmon et al., 2001), and China (Li, 2003).
The modified version of Mincer’s earnings function has been used to study the Indian labor markets (Bhaumik & Chakrabarty, 2008; Jacob, 2018; Vasudeva Dutta, 2006). The most widely used data set is from the National Sample Survey Organisation (NSSO), now renamed as the National Statistical Office (NSO) (Mohanty, 2021). The limitation of the NSSO data sets is that they are not amenable to panel data analysis directly, as they do not track the same household over time. Since cross-sectional studies have limitations and their findings are considered less reliable, panel data sets are preferred. Three such nationwide data sets are currently available in India: the India Human Development Surveys (IHDS) (Azam et al., 2013), the Periodic Labor Force Survey (PLFS) (NSO, 2021), and the Consumer Pyramids Household Survey (CPHS) of the Centre for Monitoring Indian Economy (CMIE) (Mamgain, 2021). The IHDS data has been used by numerous studies in the past (Akter & Chindarkar, 2020; Azam et al., 2013; E. Chatterjee & Sennott, 2021; Sarkar et al., 2019; Tamuly & Mukhopadhyay, 2022).
The PLFS has been in the public domain since 2017 to 2018 (NSO, 2021). It collects household data every quarter, however, an individual household is likely to be asked to provide information—in the form of a rotational panel—only once a year. This type of panel data has been used by Sengupta (2023) and Singha Roy (2020) among others. On the other hand, the CPHS data set has been tracking the consumption of randomly selected households thrice a year since 2016. However, unlike the IHDS and the PLFS, this data set is subscription-based.
A few studies have created a regional pseudo-panel data set using district identifiers as proposed by Deaton (1985) and, thus, overcome the problems posed by estimates of cross-section data. Bhattacharya and Sato (2017) use this technique to analyze the effect of socioeconomic factors on the real wage rate of male workers in India. They find that caste and religion were not significant determinants of wages. This is contrary to the understanding of the Indian labor market, where gender, caste, and religion have been found to influence wages (Agrawal, 2014; Sengupta & Das, 2014).
Education is an important form of investment in the human capital (Becker, 1994). The gains from higher educational attainment can translate into higher earnings as well as create pathways for social mobility through historically determined social hierarchies, which are based on gender, caste, and religion (Sen, 2000, 2001). In this study, “earnings” is used interchangeably with wages and salaries received by individuals. In the Indian context, studies have shown evidence of labor market discrimination based on gender (Chakraborty et al., 2020), caste (Kumar & Hashmi, 2020; Madheswaran & Singhari, 2016), and religion (Dhesi & Singh, 1989). Therefore, to estimate the degree of discrimination, it is important to measure the differences in individual earnings on the basis of known social characteristics.
Material and Methods
Data
We use two of the latest IHDS rounds (round 1 conducted in 2004–2005, hereafter IHDS I, and round 2 conducted in 2011–2012, hereafter IHDS II). This data set emerged as a collaborative research program between the National Council of Applied Economic Research, New Delhi, India, and the University of Maryland, United States (Desai & Vanneman, 2005, 2015). Certain features of the IHDS data set make it unique in comparison to the commonly used data sets in Indian literature. First, topics in IHDS extend across indicators of caste, community, consumption, the standard of living, energy use, income, agriculture, employment, government subsidies, education, social and cultural capital, household and family structure, marriage, gender relations, fertility, health, village, infrastructure, among others. Second, the human development indicators documented are extensive and the data set contains a wide array of contextual measures. Third, the panel components allow for richer inferences to be drawn between time periods.
IHDS I is a nationally representative data set, covering 41,554 households in 1,503 villages and 971 urban blocks across India. The sample consists of 26,734 rural and 14,820 urban households. Of the 593 districts in India in 2001, 384 were included in IHDS I. The household sample in IHDS I is a composite of several separate subsamples that were each drawn somewhat differently. It contains a subsample of re-interviewed households, last interviewed in 1994 to 1995 for the Human Development Profile of India (HDPI). Therefore, the data set includes 13,900 households from the HDPI and a subsample of 27,654 new households. IHDS II covers 42,152 households in 384 districts, 1,420 villages, and 1,042 urban blocks located in 276 towns and cities. Most of the households from IHDS-I (83%) were re-interviewed for IHDS II. Both surveys cover all states and union territories of India except Andaman/Nicobar and Lakshadweep.
We first downloaded the Stata-formatted files from the IHDS repository, which is free to download for researchers after registration (Desai & Vanneman, 2005, 2015). We merged the individual records from the two rounds of IHDS and arrived at 150,995 individuals. Only individuals who had been interviewed in both rounds were retained for observation. The method to identify and merge these files is available from the IHDS repository. On completion of the merger, we cross-checked the resultant number of observations with IHDS documents to ensure that we had the same numbers as were derived from the master notes. The merged data set provided a balanced individual-level panel. Thereafter, we merged the household-level dataset from the two rounds to obtain a balanced household-level panel. Next, we linked the individual characteristics with the household characteristics by merging the individual panel with the household panel, which provided us with an extensive panel data set for a comprehensive analysis.
The dependent variable in our analysis is the natural log value of the hourly earnings of individuals (lnHourly_Earnings). As discussed earlier, in this context, earnings include wages as well as salary work. IHDS I and II collected information pertaining to wage and salary work under the income and social capital heads. We have used the variables WS8hourly (hourly wage total) and WSEARNHOURLY (hourly wage and bonuses) from the IHDS I and II databases, respectively. To make monetary values in the two rounds—such as earnings and consumption—comparable, we have used a deflation factor provided by IHDS, which normalizes the 2005 values and makes them comparable to the 2012 values.
Sub-Sample Selection
Since our focus is on the earnings of eligible persons in the labor force, we have restricted our analysis to individuals belonging to the working age group (15–64 years)—the age group considered eligible to participate in the labor force (The World Bank, n.d.). To study the variation in earnings and education between the two rounds, we have excluded (a) those who were not interviewed in both rounds, (b) those who were still enrolled in school or college during IHDS II and thus ineligible to be in the labor market, and (c) those whose education level did not change between the two rounds.
Variables
Our dependent variable is the natural log of hourly earnings (lnHourly_Earnings) reported by each individual in the outcome equation, in keeping with the literature on this subject (Eisnecker & Adriaans, 2023; J. Heckman & Polachek, 1974; Hoffmann & Kassouf, 2005). The dependent variable in the selection equation is a binary variable that indicates whether a person is in the labor force or not (Labor Force Participation). We constructed this variable based on whether a person reported an earning. Among the covariates, we have used variables that are commonly used in the literature for predicting earnings: the individual’s education in completed years (Education), potential experience (Experience), and the square of potential experience (Experience_square).
We have used family characteristics like the number of children (Nchildren) (Mroz, 1987), marital status (Married) (Comola & de Mello, 2013), and the number of adults in a household (Nadults) (Ashraf et al., 2013; Kugler & Kumar, 2017), among others. In India, women are anticipated to exit the labor market when they become mothers (Jaumotte, 2003). This can be partly attributed to the patriarchal structure of Indian society, as the main burden of child-rearing falls on the mother (Bhambhani & Inbanathan, 2018). Therefore, variable Nchildern adversely influences women’s presence in the labor market and the nature of the employment (Chun & Oh, 2002).
The family’s economic status is often reflected in their consumption expenditures. We have used household consumption expenditure to first classify households as below or above poverty. To create consumption quintiles, the entire sample was divided into two broad groups by level of consumption: those below the poverty line (BPL) and those above it (APL). We further disaggregated the APL group into five quintiles (labeled APL1 (low) to APL5 (high)).
Gender identities have been widely acknowledged as a determinant of employment and earnings across the world (Blau et al., 2012) and studies show that a gender wage gap exists in the labor market (Sengupta & Das, 2014). We have included the variable Female (binary values with Female = 1, Male = 0) to examine any gap in returns to education.
Social groups in India are broadly classified along two identities—caste and religion. The role of caste and religion in determining earnings has been examined before (Arabsheibani et al., 2018; Bhaumik & Chakrabarty, 2010; Geetha Rani, 2013). India also has a unique social hierarchical system based on caste (Deshpande, 2011), which has well-known implications for economic outcomes (Bhaumik & Chakrabarty, 2008; Lolayekar & Mukhopadhyay, 2020). The hierarchy places the general castes at the top of the social ladder, followed by the other backward castes (OBC) and scheduled castes (SC). The scheduled tribes (ST) are strictly not part of the caste system. Rather, they are indigenous groups and are ranked often at the bottom of the social hierarchy (GoI, 2016; Mosse, 2018). There have been many social movements to abolish social hierarchies based on caste (Ray & Katzenstein, 2005). The state policy has also undertaken affirmative action by providing educational and employment reservations for lower castes (Deshpande, 2013). However, there is evidence that caste hierarchies still exist (Arabsheibani et al., 2018). In IHDS I, there were five caste categories—namely Brahmin, OBC, SC, ST and others. In IHDS II, there were six caste categories—namely Brahmin, Forward /General (except Brahmin), OBC, SC, ST and others. In order to make the categorization uniform across the two rounds and to match the government classification for employment reservations, we have classified the caste data into four categories, namely General (Forward, Brahmin and Others together), OBC, SC and ST. We created categorical groups as a variable Caste (General = 1, OBC = 2, SC = 3, ST = 4).
Religion has been identified as a determinant of economic outcomes (Barro & McCleary, 2003). It is recognized as a site for determining social and economic hierarchy. Studies have found significant differences in development indicators based on religious groupings (Geetha Rani, 2013). A government report found that Muslims as a group have the lowest development indicators among all religious minority groups in India (GoI, 2006). In IHDS I and IHDS II, there were nine religion categories—namely Hindu, Muslim, Christian, Sikh, Buddhist, Jain, Tribal, Others, and None. In order to make the categorization tractable, we created three groups, namely Hindu, Muslims and Other minorities (which included all the remaining groups, i.e., Christian, Sikh, Buddhist, Jain, Tribal, Others and None). The new variable Religion was a categorical variable (Hindu = 1, Other minorities = 2, Muslims = 3).
Social networking has been found to influence individual earnings in China (Liu, 2017), United States (Smith, 2000), Sweden (Behtoui & Neergaard, 2010) as well as India (Deshpande & Khanna, 2021). We added an indicator of social networking to the outcome equation (Networking Intensity). We constructed this variable by utilizing information regarding a household’s membership in one or more social or economic networking groups (membership in co-operatives, caste or religious groups, self-help groups, among others) (Dasgupta, 2005; Deshpande & Khanna, 2021).
In keeping with the literature, we constructed a variable to measure potential experience, since the IHDS did survey the experience of individuals directly (Lemieux, 2006). This variable is equal to the respondent’s age minus the years of education minus 5 years (to account for preschool years as relevant in the Indian context). We have capped the maximum experience at 45 years (the gap between the entry and exit age group for labor force participation). It is debatable whether such an assumption is valid. We acknowledge that this assumption may not be universally valid but, under specific circumstances, it could reflect the labor market situation. We contend that during the years the two rounds of the IHDS surveys were conducted, the Indian economy experienced high growth and the labor market underwent informalisation. The data from the NSSO’s quinquennial rounds (2004–2005 and 2011–2012) suggests that the prevailing rates of unemployment in the overall population were 2.2% and 2.3% (Mehrotra, 2019). Therefore, even if there was underemployment in the economy, the actual percentage of unemployment recorded was relatively low.
Empirical Model
Mincer’s earliest form of the earnings function ran a single regression equation in the semi-log form (Mincer, 1974). The model used Education, Experience, and Experience_square as co-variates. Over the years, researchers have fed many variations in Mincer’s original model for use in different contexts. One of the reasons for using a quadratic form for experience was to account for any non-linear effect that it may have on earnings, which could emerge from a loss in productivity due to age. Mincer’s earnings function can be framed as:
The early econometric method for estimating the Mincer earnings function used the ordinary least squares (OLS) equation (Hartog & Gerritsen, 2016; Polachek, 2008). The econometric formulation of Equation 1 would then be stated as:
where eit = stochastic error term and β1 = the average private returns to an additional year of education.
The coefficient of Experience_square (β3) helps establish whether the relationship has any second-order effects. However, it has been recognized that OLS estimates suffer from at least two sources of bias—selectivity and ability bias (J. D. Angrist & Krueger, 1991). Selecting only those individuals who reported earnings to estimate the returns to an additional year of education could overestimate the effect of education. This error in estimation, using only the sub-sample of individuals who reported earnings, is called selection bias (J. Heckman, 1974; J. J. Heckman, 1976, 1979). The issue of endogeneity is the other bias common in these estimations (Card, 1999). Individuals may embody innate abilities or genetic traits that aid productivity over and above additional years of education. The average private return to additional years of education may be an overestimation due to these unobservable traits. Children with greater ability are more likely to continue in school than others. The returns to education may be overestimated in such cases as well (H. Patrinos, 2016).
These issues of specification and endogeneity bias have been the focus of more recent research in the area of human capital investment and, specifically, on returns to education. Different methods have been proposed to tackle the problem of endogeneity. The most popular one is the instrumental variable (IV) approach (Carneiro et al., 2011). Instruments are variables that are strongly correlated to the endogenous covariate but not correlated to the outcome variable (Briggs, 2004). The challenge in empirical studies has been to find effective instruments (J. D. Angrist & Krueger, 2001; Baltagi & Khanti-Akom, 1990). The current recommendation is to use an instrument or treatment that has been assigned randomly rather than an individual-related characteristic (Wooldridge, 2018).
In non-experimental data, the challenge of finding the right instrument has led researchers to abandon the IV method (Rossi, 2014; Truong et al., 2021), as weak instruments lead to greater bias than other techniques (Bound et al., 1995; Cruz & Moreira, 2005). For this reason, we have chosen to not use the IV method. Instead, we use the workhorse of such studies—the fixed-effects model with Heckman correction—to control for selection bias (Wooldridge, 2018), with longitudinal data to estimate fixed-effects models (Kamhöfer & Schmitz, 2016; Leigh, 2008). This model controls for average differences across individuals in any observable or unobservable predictors. The within-group estimate explains the effect of covariates on the earnings of an individual over time.
In the first stage (selection equation), we predicted participation in the workforce with instruments such as Married, Nchildren, and Nadults apart from the explanatory variables of the outcome equation (Semykina & Wooldridge, 2010). The selection equation is stated as:
where X = the set of exogenous explanatory variables of the outcome equation indicated in Equation 4 and ε = stochastic error term.
In the second stage of the Heckman procedure (outcome equation), we used a semi-log polynomial model to allow second-order covariates for experience (Equation 4). The outcome equation is stated as:
where IMR = the inverse Mills ratio generated from the first stage selection equation, Z = the set of included instrumental variables from the selection Equation 3 (Married, NChildren, and Nadults), e = error term, i = ith individual, and t = year (2005 or 2012).
Econometric Models, Estimation, and Tests
First, we estimated the Mincer earnings function using fixed effect (FE) and random effect (RE) models (using the xtreg command in Stata 16.1). Then, we used the Hausman test (hausman command) to confirm that the FE model was more suitable for our analysis. Unlike earlier panel commands for the Heckman model, xtheckmanfe allows us to incorporate FE while addressing selection bias. This, we believe, is a methodological improvement over other previous studies (Briggs, 2004; Schwiebert, 2012), and our estimates update their results.
In the selection Equation 3, we used Married, Nadults, and Nchildren as excluded instruments (Comola & de Mello, 2013). These coefficients have been used in the recent literature as well (Beam et al., 2020; Sarkar et al., 2019). We anticipate that these variables could be acceptable instruments for selection in the labor market and tested the same for confirmation. The model also estimates a value for the inverse Mills ratio (IMR) by year, to test whether selectivity bias exists in the estimation. IMR is included in the outcome Equation 4 as a covariate. A significant value of IMR would imply the existence of selectivity bias. We estimate the coefficients for Equations 3 and 4 by consumption quintiles, caste, gender, and religion.
Results
We found that the (natural log of) average hourly earnings of an individual (overall) was 3.02 with a standard deviation of 0.8 (see Table 1). Average education was 7.9 years, with a standard deviation of 4.4 years (overall). The average experience was 19.5 years, with a standard deviation of 13.5 years (overall). Of the total sample, 62% were married and 39% were female. The average number of children was 1.54 and the adults were 3.47. The detailed groupwise values can be found in Supplemental Table S1.
Descriptive Statistics.
Source. Author’s calculations based on IHDS I and II (Desai & Vanneman, 2005, 2015).
We tested to ensure there was no multi-collinearity between the independent variables predicting earnings by using the variance inflation factor (VIF) measure. Our overall regression results showed VIF values between 1.44 and 1.88, which is considered moderate and requires no corrective action (Wooldridge, 2018).
To decide which of the FE and RE models was more suitable to this study (estimates can be found in Supplemental Table S2), we used the Hausman test (results can be found in Supplemental Table S3). The test statistic chi-square value was 186.29 (with a p-value .0000), which indicates rejection of the null hypothesis (of RE). Therefore, we accepted the alternate hypothesis that the FE model was better suited to our data.
The FE Heckman model (without disaggregation) suggests that earnings are positively and significantly affected by Education (see Table 2). The coefficient of Experience is positive and significant but that of the square term is negative and insignificant. Networking intensity has a positive and significant impact on earnings. The coefficients for IMR (2005 and 2012) are significant and indicate a selection bias, thus justifying the use of the selection model. Selection equation results are reported in Supplemental Table S4.
Panel Estimates From the Fixed-effects Heckman Model (full Sample) (Dependent Variable: lnHourly_Earnings).
Source. Author’s calculations based on IHDS I and II (Desai & Vanneman, 2005, 2015).
Note. Total N = 51,989. CI = confidence interval; LL = lower limit; UL = upper limit.
p < .1. **p < .05. ***p < .01.
By Consumption Quintiles
The results of the disaggregated estimation with six economic groups individually are presented in Table 3. We found that Education has a positive and significant impact on earnings across all quintiles. Experience too was positive and significant for APL 2 and APL 5. The squared effect was significant and negative for APL 5. Networking intensity was significant and positive for APL 1. The coefficients for IMR were significant for all quintiles (except BPL and APL 5) in 2005 and 2012, confirming the presence of selectivity bias. This also justifies the two-step approach of Heckman in our context. Selection equation results are reported in Supplemental Table S5.
Estimates From the Fixed-effects Heckman Selection Model by Consumption Quintiles (Dependent Variable: lnHourly_Earnings).
Source. Author’s calculations based on IHDS I and II (Desai & Vanneman, 2005, 2015).
Note. Standard errors in parentheses.
*p < .1. **p < .05. ***p < .01.
By Gender
We estimated the model by two groups—male and female—as information was provided for these groups only (see Table 4). Education was positive and significant for both males and females. Experience was positive and significant for females and jointly significant for males. The square term (Experience_square) was negative. This suggests an inverted U-shaped relation between earnings and experience for females. The implication is that while women’s earnings increase with experience, the rate of increase in earning—caused by a rise in experience—occurs at a decreasing rate. Networking intensity was positive and significant for males. The IMR was significant for both groups in 2005 and for males in 2012. Selection equation results are reported in Supplemental Table S6.
Estimates From the Fixed-effects Heckman Selection Model by Gender (Dependent Variable: lnHourly_Earnings).
Source. Author’s calculations based on IHDS I and II (Desai & Vanneman, 2005, 2015).
Note. Standard errors in parentheses.
*p < .1. **p < .05. ***p < .01.
By Caste
We estimated coefficients for four caste groups (see Table 5). We found that education has a positive and significant impact on all castes except SC. Experience was positive and significant for general castes, OBC, and ST. Experience_square was negative and significant for general, OBC (jointly), and ST. This suggests a U-shaped relation between earnings and experience for the two groups (general and ST). Networking intensity was positive and significant for SC. The IMR was significant for most of the values, indicating the presence of a selection bias. Selection equation results are reported in Supplemental Table S7.
Estimates From the Fixed-effects Heckman Selection Model by Caste (Dependent Variable: lnHourly_Earnings).
Source. Author’s calculations based on IHDS I and II (Desai & Vanneman, 2005, 2015).
Note. Standard errors in parentheses.
p < .1. **p < .05. ***p < .01.
By Religious Groups
Education was significant and positive for all three religious groups (Table 6). Experience, on the other hand, was significant and positive for Hindus but not for the other groups. Networking intensity was significant and positive for Hindus only. Selection equation results are reported in Supplemental Table S8.
Estimates From the Fixed-effects Heckman Selection Model by Religion (Dependent Variable: lnHourly_Earnings).
Source. Author’s calculations based on IHDS I and II (Desai & Vanneman, 2005, 2015).
Note. Standard errors in parentheses
p < .1. **p < .05. ***p < .01.
Analysis
Our results demonstrate that education is significant and positive for all categories of individuals—economic strata, gender, caste, and religion. However, there is a wide range of returns from education to each of these subgroups.
When we compare returns to education by consumption quintiles (see Figure 1), the values go from 2.4% (BPL) to 8.8% (APL 3). Returns to females (5.2%) are much higher than the returns to males (3.7%).

Projected earnings by consumption quintile with education and experience as covariates.
The received literature on gender differentials in earnings has universally found that women earn less than men for a given level of education and experience. We did not find evidence contrary to this. However, we did find that there is a tendency toward convergence in earnings between males and females for given levels of education and experience. This has been noted by other authors as well, and it can be attributed to structural and technological change in part (Cortes et al., 2020). As the workforce has moved from manufacturing to service-oriented employment, women are less likely to be disadvantaged in such work situations. Moreover, their soft skills at the workplace may be more sought after than those of men. Our findings provide evidence of convergence in the long run in the Indian context (see Figure 2). We found that the initial value of earnings (intercept value) for males is much higher than that of females. However, the marginal returns to an additional year of education are much higher for women than men.

Projected earnings by gender with education and experience as covariates.
Among the caste groups (see Figure 3), the SCs show the highest returns to education (5.9%) followed by the general castes (4.6%). However, general castes have a higher intercept, which elevates their earnings above other castes for almost the entire range of years. Among the religious groups (see Figure 4), Hindus (3.8%) and Muslims (2.3%) have a lower return than other minorities (9.9%). These results suggest that investment in education does pay dividends for social development in India, in alignment with the literature (Tilak, 2021).

Projected earnings by caste with education and experience as covariates.

Projected earnings by religion with education and experience as covariates.
There is, therefore, a need for a targeted approach to ensure that these groups do not fall out of the education system for lack of ability to pay (Mishra & Ramakrishna, 2023). The Right of Children to Free and Compulsory Education (RTE) Act of 2009 makes it obligatory for the government to ensure that every child in the age group of 6 to 14 years is provided free and compulsory education (GoI, 2009). This is a step toward inclusive development (Das, 2020) although there is evidence of unintended outcomes (C. Chatterjee et al., 2020). Educational affirmative interventions based on caste and gender have been found to have positive impacts in India (Bagde et al., 2016). Our findings are along the lines of Chakraborty and Bohara’s (2021) study on the role of caste and religion in India and their recommendation for policy intervention for socially backward classes and Muslims to establish equity in the labor market.
There is a constitutionally mandated reservation of jobs in the public sector for less privileged castes (Bhaskar, 2021). This has helped overcome selection bias against the lower castes in the public sector to some extent. However, reservations for women have been limited to the lowest tier of government despite the longstanding proposal to bring in 33% reservations at all tiers of government and in other areas of employment (Marwah, 2019). The presence of women in government has had positive outcomes in local assets and employment creation (Deininger et al., 2020).
Discussion
Currently, three nationwide panel data sets are available at the household level: IHDS, the PLFS, and CPHS. Over two waves at the household level, the IHDS panel data set has been used to study the impact of natural disasters (Tamuly & Mukhopadhyay, 2022) and rural insurance (Azam, 2018). It has also been used at the individual level as a repeated cross-section to study domestic violence (Akter & Chindarkar, 2020) and an individual panel to study women’s health (E. Chatterjee & Sennott, 2021). Sarkar et al. (2019) also use a panel similar to our study but limit themselves to the question of determinants of women’s participation in the labor force in India.
The Government of India also publishes a new data set on employment and unemployment status, PLFS, which is collected every quarter (NSO, 2021). Although the data is collected every quarter, one household is interviewed only once a year, making it a rotational panel. This data has been used by researchers such as Sengupta (2023) to estimate the Mincer equation with a Heckman selection model to predict wages in India. However, some studies have used pooled unit-level data and therefore the estimates are likely to have the limitations that cross-section models encounter, such as omitted variable bias and dynamic relationships among other issues (Hsiao, 2007). We overcome these estimation limitations by using fixed-effect panel data. Our study, therefore, is probably the first study that uses a panel data set to explain the impact of education and experience on earnings in India. While we confirm Sengupta’s (2023) finding that females earn less than men, we add a more nuanced view of the gender gaps in the labor market to the existing literature. We find that while women have lower starting earnings, they have higher incremental earnings as they gain in education and experience. The panel characteristic of the PLFS database has been used to study urban employment in India (Roy et al., 2022). However, it has not been exploited to respond to the question of wage determination.
The other panel data set at the household level currently available is the CPHS. However, the focus of this database is on tracking consumption (Abraham & Shrivastava, 2022) and, similar to the PLFS, it is too short to study medium or long-term impacts on wages attributable to education or experience. Singha Roy (2020) uses a smaller regional panel of 18 villages in semi-arid regions to study wage determination. This study reports that education is significant for men at all levels of education but only significant at the level of graduation for women. A more elaborate study by Khanna (2023) uses a combination of data sets that link individual-level wages to district characteristics to create a district-level panel data set. This study suggests that men gain most from additional years of education than women in terms of wages. We find contrary evidence that even though women start with lower wages than men, their marginal increase for additional years of education and experience out-pace the growth of wages for men.
To our understanding, no study has used a panel data set in the Indian context that has examined the role of education and experience in wage determination with a nationwide representative sample with a household-level panel thus far. Our results, therefore, update all earlier estimates in this context.
In a recent global review, Psacharopoulos and Patrinos (2018) examined 1,120 estimates in 139 countries. They found that the average return from an additional year of education was about 9%. Earlier, Peet et al. (2015) had found the return to education to be about 7.6% in developing countries and this, according to them, was not dissimilar to developed countries. In Indonesia, the impact of school expansion led to an increase in wages by 6.8% to 10.6 % (Duflo, 2001) and returns from higher education were about 15% (Yubilianto, 2020). Most studies indicate heterogeneity in returns by gender and region within and across countries, with a higher return for women and urban workers.
In India, Kingdon and Theopold (2008) used two rounds of NSSO data (1993–1994 and 1999–2000) and found the returns to be 8.2% (1993–1994) and 7.7% (1999–2000) using an OLS regression and similar to estimates of the Heckman model. In a recent study, Bairagya (2020) used the IHDS II data and found the returns to be between 5.6% (OLS) and 9.4% (IV approach). Geetha Rani (2014) found the returns to education to be between 14% (OLS) and 5.1% (Heckman selection model) using the IHDS I data. A gender-disaggregated analysis for an additional year of education result was reported by Fulford (2014), who found private returns to be 9.3% (for males) and (-) 6.5% (for females) using OLS using NSSO data. Mendiratta and Gupt (2013) report the highest returns at 15%, using a pseudo-panel approach with NSSO data for two rounds (2004–2005 and 2009–2010). When returns to education were estimated in a general equilibrium (GE) framework, Khanna (2023) found that skilling led to an increase in returns by 19.9% (without GE effects) and 13.4% (with GE effects).
There seems to be a wide range of variation in the reported private returns to education in India. However, all earlier estimates in the Indian context have used cross-sectional data or a pseudo-panel approach. We improve on all of them by using a nationwide household-level panel, with fixed effects for the Heckman selection model approach. Our results are comparable to some of the international findings on returns to education on average (Hicks & Duan, 2023; Klasen et al., 2021; Korwatanasakul, 2023).
Our study also looked at the role of social capital by examining the impact of networking intensity on different groups. We find that this variable had a positive impact on the earnings of males (among gender groups), Hindus (among religious groups), and APL1 (among consumption quintiles). Our results are in line with the findings of a study based in China (Liu, 2017), where individual-level social capital was found to have a greater impact on men than women. Similar effects were reported by (Smith, 2000) in the context of the United States. Behtoui and Neergaard (2010) found that marginalized groups in Sweden had lower wages on average due to a lack of access to social capital. This could be anticipated in India and might explain the wage gaps for women and marginalized social and religious groups (Deshpande & Khanna, 2021).
Conclusion
Our findings have important policy implications and connect to multiple SDGs, such as poverty (SDG 1), quality education (SDG 4), gender equality (SDG 5), and reduced inequalities (SDG 10). The government of India has been committed to the fulfilment of the SDGs and has undertaken multiple social interventions. In the sphere of education as well, policies have been developed to enhance access and improve quality education. About a decade and a half ago, the Right of Children to Free and Compulsory Education Act or Right to Education Act, 2009, for free and compulsory schooling (GoI, 2009) was enacted to reduce the dropout rate in schools and ensure compulsory education up to 14 years. The New Education Policy announced in 2020 also attempts to upgrade quality education. Our findings provide evidence that education (SDG 4) helps in achieving the goals of gender equality (SDG 5) and social justice for marginalized (caste and religious) groups (SDG 10).
The government has also implemented affirmative action policies in the domain of jobs and education for certain groups based largely on caste (Deshpande, 2013) and, more recently, economic backwardness (Nath, 2019). We find that despite education positively impacting earnings, the less privileged groups would require affirmative action to achieve reduced inequalities (SGD 10) and escape poverty (SGD 1).
Our study addresses the issues of economic distribution, gender, caste, and religion in the context of larger sustainable developmental goals (Dhar, 2018; Pandey, 2019). Our findings of higher marginal returns to education among females justify the need for greater investment in education for women. The lower returns on education among STs and Muslims indicate the need for affirmative action for these groups. Similarly, the positive and significant coefficient for all consumption quintiles suggests that anti-poverty programs in combination with educational opportunities for the less privileged (BPL) would meet the goals of social justice. Development policies that combine multiple objectives would therefore have a more effective welfare outcome.
While several innovations have been implemented in our study, we acknowledge some limitations. We were unable to separate the impact of ability and inherent skills from the years of education in wage determination (ability bias). The reporting of caste was non-uniform in the two rounds of IHDS. This could be due to the re-classification of castes, which was an ongoing process at the time of the survey. This could be further resolved when data from new ongoing survey rounds becomes available. Also, our study did not stratify the labor market by state and sector. Incorporating these classifications could yield new insights in a country like India with large geographic and economic heterogeneity.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440231220942 – Supplemental material for How Much do Education, Experience, and Social Networks Impact Earnings in India? A Panel Data Analysis Disaggregated by Class, Gender, Caste and Religion
Supplemental material, sj-docx-1-sgo-10.1177_21582440231220942 for How Much do Education, Experience, and Social Networks Impact Earnings in India? A Panel Data Analysis Disaggregated by Class, Gender, Caste and Religion by Yasser Razak Hussain and Pranab Mukhopadhyay in SAGE Open
Footnotes
Acknowledgements
We would like to thank five anonymous reviewers of the journal for their comments on an earlier draft. These have helped improve our paper significantly.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical Approval
An ethics statement for animal and human studies is not applicable.
Data Availability Statement
The data that support the findings of this study are available in the public repository from the Data Sharing for Demographic Research program of ICPSR, the Inter-university Consortium for Political and Social Research: https://www.icpsr.umich.edu/web/DSDR/studies/22626 and ![]()
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
