Untangling the network effects of productivity and prominence among scientists
[ad_1]
We begin by extracting pairs of coauthors defined across 20.0 million research articles in the Microsoft Academic Graph (MAG) database since 195048,49, across six STEM fields: biology, chemistry, computer science, mathematics, medicine, and physics. To better isolate the most important network connections, we focus on the coauthorship links defined by the first and last authors of each paper. Subsetting to only the first-last author pairs connections eliminates the network effects on productivity and prominence caused by variations in the number of coauthors per paper, middle-author contributions of all types, trends over time and across fields in team sizes, and other related confounds. This selection preserves and focuses our analysis on the most important collaboration links according to common coauthorship norms in STEM fields, e.g., traditional mentor-mentee relationships, where the junior scholar is typically the first author and their senior colleague is the last author.
The nature of coauthorship in scientific publications tends to confound direct measures of the productivity and prominence of individual scientists. Highly productive scientists tend to have many collaborators, often including each other, and the productivity of these individuals tends to lift the productivities of others by virtue of those collaborations. In the same way, highly cited scientists tend to increase the prominence of their collaborators, and often, the same collaborators are both highly productive and highly cited. Bibliometric normalization schemes, such as fractional authorship, can be viewed as paper-level adjustments for these network effects of collaboration.
However, untangling the network effects of collaborations over a scientific career to estimate each individual’s contributions within the interdependent context of coauthorship networks requires a generative network model. Here, we introduce two such models that can control for these collaboration network effects and allow us to quantify the latent productivity and prominence of individual researchers, and their relationship with social and epistemic inequalities in scientific careers.
We model the production of publications by a pair of coauthors as a stochastic outcome of their joint efforts, governed by a linear combination of their individual latent productivity parameters (Fig. 1a). Mathematically, the number of coauthored publications is the output of a pairwise Poisson process, parameterized by the sum of the latent individual productivities λi and λj for coauthor pair (i,j). Hence, the model parameter λi gives the expected number of publications per year for author i, and for an author pair (i,j), their joint productivity is a random variable of the form
$$P({N}_{ij},{t}_{ij}|{\lambda }_{i},{\lambda }_{j})=\frac{{\exp }^{-({\lambda }_{i}+{\lambda }_{j}){t}_{ij}}{[({\lambda }_{i}+{\lambda }_{j}){t}_{ij}]}^{{N}_{ij}}}{{N}_{ij}!},$$
(1)
where Nij is the observed number of papers coauthored by authors i and j over a total collaboration time period tij (see Methods).
Similarly, we model prominence, defined as the number of high-impact publications, as a joint function of individual latent parameters (Fig. 1a). Mathematically, researcher prominence is modeled by a Binomial distribution, parameterized by the sum of the latent individual prominences θi and θj of the coauthor pair (i,j). Hence, the model parameter θi gives the expected fraction of publications with i as an author that will be highly cited, and for an author pair (i,j), their joint prominence is a random variable of the form
$$P({N}_{ij},{m}_{ij}|{\theta }_{i},{\theta }_{j})=\left(\begin{array}{c}{{N}_{ij}}\\ {{m}_{ij}}\end{array}\right){({\theta }_{i}+{\theta }_{j})}^{{m}_{ij}}{[1-({\theta }_{i}+{\theta }_{j})]}^{{N}_{ij}-{m}_{ij}},$$
(2)
where mij is the observed number of highly cited papers coauthored by authors i and j over a total collaboration time period tij (see Methods). We note that both models assume conditional independence across publications, which may obscure some interesting temporal effects50. Applying these joint productivity and prominence models to all pairs of coauthors in a collaboration network yields joint likelihood functions whose independent maximization yields a set of individual productivity and prominence parameters that effectively control for the network effects of coauthorship on the variables of interest
$$L({{{{{{{\boldsymbol{\lambda }}}}}}}})=\mathop{\sum}\limits_{i\ne j}\log P({N}_{ij},{t}_{ij}|{\lambda }_{i},{\lambda }_{j})\qquad L({{{{{{{\boldsymbol{\theta }}}}}}}})=\mathop{\sum}\limits_{i\ne j}\log P({N}_{ij},{m}_{ij}|{\theta }_{i},{\theta }_{j}).$$
(3)
Applied to our full dataset of 198,202 mid-career researchers across six STEM fields, defined as researchers with at least 15 years of scholarly publishing activity (see Supplementary Information), we find compelling evidence that these latent parameter models yield a useful individual decomposition of the observed joint productivities and prominences of collaborating scientists (Fig. 1b and Supplementary Fig. 3 for individual fields). Examining the marginal distributions, we find that the latent productivity and prominence variables are nearly orthogonal (Pearson’s r = 0.09, p < 10−3), with λ following a Normal distribution and θ following a heavy-tailed distribution. That is, controlling for network effects, we find that individual productivity of mid-career researchers is low variance and concentrated around a central tendency of μλ = 0.39 first/last-authored papers per year (standard deviation σλ = 0.15), with only the top 0.02% of researchers exhibiting a latent productivity of \(\hat{\lambda } \; > \;2\) first/last-authored papers per year.
In contrast, controlling for network effects, individual prominence is highly variable, with an average prominence of μθ = 0.04 (on average, for publications written by two authors, 1 out of 12.5 will be highly cited), but a standard deviation twice as large (σθ = 0.08). That is, a large majority of researchers have low individual prominence, while a minority generate a long tail of much greater impact, much like measures of popularity and wealth in other complex social systems51. Furthermore, both of these estimated parameters have low correlation with a researcher’s career-wise raw productivity, with the Pearson correlation coefficients rλ,N = 0.21 and rθ,N = −0.02. This implies that after controlling for the network effects of collaboration, the latent parameters could indicate the productivity and prominence of individual researchers in a given unit time period. As a technical aside, we note that parameter estimates for these models are more stable for researchers with at least 10 papers, and appear to underestimate latent productivity λ and overestimate prominence θ for less productive authors (Supplementary Fig. 5). The distribution of θ does not qualitatively change when we alter the threshold of highly-cited papers (Supplementary Fig. 6).
If the estimated individual productivity and prominence parameters λ and θ are genuinely measuring individual-level characteristics, controlling for network effects from collaboration, then they should only loosely correlate with their corresponding network-confounded measures of raw productivity and raw prominence. We evaluate the efficacy of these two measures by characterizing their correlation with other “unadjusted” measures and time-related dynamics for individual researchers. We first select a cohort of minimally productive mid-career researchers who have published at least 10 papers by their 15th year, and tabulate a correlation matrix of estimated individual parameters and observed scholarly statistics, based on their publications through their mid-career (Fig. 1c). We define a researcher to be “high λ” or “high θ” if their individual estimated parameter is in the upper 10th percentile of same-field researchers for a given year. And, we define a high λ or θ coauthor as a collaborator who is themselves a high λ or θ author and has published at least three papers by the year of relevant collaboration. This correlation analysis reveals that a researcher’s individual λ and θ values correlate only moderately with their “unadjusted” productivity and prominence (λ with papers, Pearson’s r = 0.21; θ with citations, Pearson’s r = 0.36), indicating that the model parameters are capturing behavior above and beyond what the unadjusted counts provide. And, we find strong evidence of the network effects of collaborations in driving the observed productivity and prominence of individual researchers, because the number of high λ and high θ coauthors correlates more strongly with individual productivity and prominence (papers vs. high λ coauthors, Pearson’s r = 0.70; citations vs. high θ coauthors, Pearson’s r = 0.49) than do the individual’s own model parameters. Hence, these network models can shed new light on the substantial but often hidden role that social networks can play in determining individual career metrics.
Similarly, if the estimated individual latent parameters are measuring a researcher’s underlying characteristics, they should remain relatively stable over an individual’s career path, even as their collaboration network evolves. Compared to a fully randomized null model, we find that high λ or high θ researchers are more likely to remain in the same percentile group after 10 years (see Supplementary Information, and Supplemenatry Figs. 7–9). Furthermore, researchers with high latent parameter values in their early-career (first 5 years of publishing) are also more likely by their mid-career to be in the upper 5th percentile of citations among researchers who publish in a given field in a given year. And, this pattern holds when we repeat the analyses in matched-pair experiments, in which we match researchers on their institutional prestige, productivity, and prominence in their early-career (Supplementary Fig. 10, Supplementary Tables 1–4). These results indicate that an individual researcher’s estimated model parameters for productivity and prominence are relatively stable over a career, suggesting that they are capturing underlying scholarly behavior independent of changes in collaboration patterns over time, as intended.
In agreement with past studies, we find gendered inequalities in observed measures of both career-wise productivity (Fig. 2a) and prominence (Fig. 2d) among mid-career STEM researchers, in which men both publish more papers and receive more citations than women22,52,53. On average, men in these fields publish a total of 20.3 papers by the time they reach their mid-career (first 15 years) compared to 18.3 papers by women (t-test, t = 24.5, p < 0.001, Cohen’s d = 0.15 ± 0.01), and, on average, men’s past publications receive 346.0 total citations compared to 330.1 citations for women’s (t-test, t = 4.9, p < 0.001, Cohen’s d = 0.03 ± 0.01). In other words, men’s average total productivity is 11.0% greater and they receive 5.0% more citations than women by mid-career, and these disparities are stable over time. For researchers with at least three publications in the first 5 years of their publishing career, i.e., in their early career, the probability of persisting until mid-career is 20.6% for men but only 15.7% for women, in agreement with the well-known higher drop-out rate for early-career female scientists53. Despite these differences in observed scholarly metrics, controlling for collaboration via our network models reveals a different pattern: across fields, the average mid-career latent productivity parameter is \(\hat{\lambda }=0.39\) for both men and women (t-test, t = 0.7, p = 0.51, Cohen’s d < 0.01), and the average mid-career latent prominence parameter \(\hat{\theta }=0.044\) for men and 0.045 for women (t-test, t = 0.82, p = 0.41, Cohen’s d < 0.01). That is, men and women exhibit statistically indistinguishable individual latent productivities and latent prominences, implying that the differences in observed scholarly metrics are likely caused by gendered differences in the structure and composition of researcher collaboration networks (Fig. 2b, e).
Furthermore, we find that the gendered gaps for mid-career researchers can be largely explained by variation in the number of direct coauthors in their collaboration networks. Matching women and men researchers by institutional prestige, year of first publication, and field, we still find a gendered disparity in which women’s productivity and prominence is lower relative to matched men (Fig. 2c, f). However, additionally matching on the number of coauthors largely eliminates these gendered disparities in both productivity (10.5%, t-test, t = 24.5, p < 0.001, Cohen’s d = 0.15 ± 0.01 vs. 0.7%, t-test, t = 1.3, p = 0.20, Cohen’s d = 0.01 ± 0.01) and prominence (12.8%, t-test, t = 4.9, p < 0.001, Cohen’s d = 0.03 ± 0.01 vs. 2.3%, t-test, t = 2.0, p = 0.04, Cohen’s d = 0.02 ± 0.01). Hence, we find substantial evidence that the well-known gendered productivity and prominence inequalities among women and men researchers can be largely explained as a network effect, in which the composition and size of local collaboration networks differ between men and women, and these differences lead to the observed differences in scholarly metrics, rather than any inherent difference in the researchers themselves. We note that this analysis does not establish a causal relationship, and hence known causal factors, such as the gendered impact of parenthood on researchers that leads to productivity penalty for mothers as they undertake more childcare duties54, likely influence both productivity and collaboration networks. We also test the robustness of our findings by selecting mid-career researchers with at least 20 publications (Supplementary Fig. 13) and repeating the analysis by randomly sampling a tertile of researchers (Supplementary Fig. 14), showing that these different choices do not change the qualitative nature of our conclusions. Overall, these results suggest that collaboration networks can be viewed as a form of social capital that is distributed in unequal and gendered ways in STEM, which mediates or shapes the amount of scholarly contributions and their visibility.
If a researcher’s collaboration network acts like a form of social capital, we should expect key dynamics of social capital apply in collaboration networks as well. For instance, an author’s collaboration network capital should be “transferrable” to some degree between researchers. For example, collaboration by an early-career researcher with a high λ or high θ senior coauthor should enhance the junior researcher’s productivity or prominence in a way that persists into their own mid-career, compared to similar researchers without such a collaboration. For this analysis of junior-senior collaborations, we select pairs in which, at the time of collaboration, the early-career researcher is 5 or fewer years since their first publication, and the senior coauthor is 6 or more years since their first publication. Because the model estimates of individual latent parameters are more accurate for researchers with more papers, we restrict our analysis here to early-career coauthors and their senior coauthors that have at least three papers by the time of collaboration.
We find that early-career researchers are significantly more likely to collaborate with high λ or θ senior researchers if they are based at elite institutions, which we define as research institutions whose authoritative ranking is among the top 10 in a given field (see Methods), indicating that the composition of collaboration networks itself varies with environmental prestige55. This may be largely due to a selection effect that high λ or θ senior researchers are more likely to work at elite institutions, reflecting inequalities of having access to important social networks among early-career researchers. In particular, at pairwise coauthorships, the probability that an early-career researcher collaborates with a high λ (productivity) senior researcher is 0.177 at elite institutions vs. 0.145 at non-elite institutions (t-test, t = 19.3, p < 0.001, Cohen’s d = 0.09 ± 0.01), and the probability of collaborating with a high θ (prominence) senior researcher is 0.141 at elite institutions vs. 0.067 at non-elite institutions (t-test, t = 50.2, p < 0.001, Cohen’s d = 0.28 ± 0.01).
However, regardless of the institution, researchers who collaborated with high λ or high θ senior coauthors early in their career are significantly more likely to themselves be a highly prominent researcher in their mid-career, who have accrued the upper 5th percentile of citations among all active researchers in a given year and field (Fig. 3a, c). In particular, collaborating with at least one high λ senior coauthor in the first 5 years of a researcher’s career increases the probability of subsequently being a highly prominent researcher in the 15th career year from 16.2 to 29.5% (t-test, t = 65.0, p < 0.001, Cohen’s d = 0.34 ± 0.01; Fig. 3a). And, a high θ senior coauthor doubles that mid-career probability from 16.3 to 39.8% (t-test, t = 81.6, p < 0.001, Cohen’s d = 0.61 ± 0.01; Fig. 3c). For both types of collaboration patterns, junior researchers from elite institutions exhibit higher productivity and prominence in the mid-career than do peers at less prestigious institutions—a disparity that reflects the value of prestigious environments55. This institution-based gap is larger for early-career researchers that have collaborated with high θ coauthors than with high λ coauthors.
However, the early-career benefits of a high λ or high θ senior coauthor appear to decrease modestly with that coauthor’s career age (Fig. 3b, d). This finding contrasts with past studies of scientific mentorship56,57, which have typically relied on unadjusted citation counts that are naturally larger for more senior collaborators and which represent a stronger confounding network effect. By correcting for the network effect of collaboration, we find instead that the benefits of collaborating with highly productive or highly prominent senior coauthors do not increase with coauthor seniority. Rather, they decrease with career age of the senior coauthor, and decrease more for high λ coauthors, suggesting that the transfer of social capital from senior to junior researchers through collaboration is more effective earlier in the career of senior coauthors. We also test the robustness of our results by selecting senior collaborators with at least six publications and at least ten publishing career years by the time of relevant collaboration, (Supplementary Fig. 15), and we find that the different thresholds do not qualitatively change our findings.
Finally, we consider the impact of environmental prestige on latent productivity and prominence of mid-career researchers. Past work has shown that working at a more prestigious institution drives greater productivity and prominence among early-career researchers55. However, as with past work on the impact of mentorship, such insights were derived from scholarly measures that did not control for the network effects of collaboration, which increase as a career progresses. Across six STEM fields, researchers in our dataset affiliated with elite institutions on average publish a total of 21.8 papers up to their mid-career (first 15 years), which is 8.5% greater than the 20.1 for researchers at non-elite institutions (t-test, t = 11.5, p < 0.001, Cohen’s d = 0.11 ± 0.02, Fig. 4a). And, over the same career time, researchers at elite institutions receive on average 493.7 citations, which is 62.1% greater than the 304.5 citations received by researchers at non-elite institutions (t-test, t = 27.8, p < 0.001, Cohen’s d = 0.38 ± 0.02, Fig. 4d). Hence, in unadjusted scholarly metrics, researchers at elite institutions have marginally higher productivity and a substantially higher impact.
We find that these productivity and prominence advantages for researchers working in prestigious environments also appear in our estimated individual latent parameters. Researchers at elite institutions, on average, also exhibit a marginally greater latent productivity than those at non-elite institutions (λ = 0.394 vs. 0.387; 1.8% greater; t-test, t = 6.0, p < 0.001, Cohen’s d = 0.05 ± 0.02, Fig. 4b). And, these same researchers, on average, exhibit nearly double the latent prominence of researchers at non-elite institutions (θ = 0.071 vs. 0.037; 91.9% greater; t-test, t = 36.7, p < 0.001, Cohen’s d = 0.43 ± 0.02, Fig. 4d). Hence, controlling for the network effects of collaboration, we find smaller but still significant advantages in productivity but even larger advantages in prominence for researchers working at elite institutions, compared with raw scholarly metrics. The persistence of the advantages of elite environments after controlling for network effects suggests that other factors likely drive these differences55, e.g., differences in resources, the size of collaboration networks, or selection effects that apply primarily to mid-career researchers. In addition, we find that the results do not qualitatively change when we modify the number of selected elite institutions to the top 20 (Supplementary Fig. 16).
Some of this prestige advantage can be explained by differences in the composition of a mid-career researcher’s collaboration networks. Matching researchers in our sample by field and year of first publication, we find that researchers at non-elite institutions are only 6.8% less productive than those at elite institutions (Fig. 4c). However, further matching on variables that quantify the composition of a researcher’s collaboration network, and in particular, the number of coauthors, number of high λ coauthors, and number of high θ coauthors, we find that researchers at non-elite institutions are 2.8% more productive than those at elite institutions (t-test, t = 3.1, p < 0.01, Cohen’s d = 0.04 ± 0.02). These network effects are even stronger for the prominence of individual researchers. Matching researchers by field and year of first publication, researchers at non-elite institutions receive 39.9% fewer citations than those at elite institutions, while further matching on collaboration network variables shrinks this gap to only 19.9%. Hence, in contrast to gendered differences (Fig. 2), we find that the inequalities in productivity and prominence associated with environmental prestige cannot be explained entirely by differences in the structure of collaboration networks, suggesting that additional prestige-related variables play an important role in driving the greater scholarly impact of researchers at elite institutions.
In addition, we test the interaction effects of gender and institutional prestige on the performance of mid-career researchers. We find that the prestige of institutions has a relatively stronger effect on researchers’ productivity and prominence than gender, for both unadjusted measures and latent parameters (see Supplementary Fig. 12). In particular, both gender and institutional prestige have negligible effects on latent productivity λ, while institutions appear to have stronger influence than gender on latent prominence θ. The observation that prestige does not appear to drive latent productivity λ is supported by other recent studies, which show how the greater productivity of faculty at prestigious departments can be largely explained by a collaboration network effect: elite departments provide more available funded research labor, who then coauthor papers with the faculty members in their departments58.
[ad_2]
Source link