Category: Data Source
Web search engines are important online information intermediaries that are frequently used and highly trusted by the public despite multiple evidence of their outputs being subjected to inaccuracies and biases. One form of such inaccuracy, which so far received little scholarly attention, is the presence of conspiratorial information, namely pages promoting conspiracy theories. We address this gap by conducting a comparative algorithm audit to examine the distribution of conspiratorial information in search results across five search engines: Google, Bing, DuckDuckGo, Yahoo and Yandex. Using a virtual agent-based infrastructure, we systematically collect search outputs for six conspiracy theory-related queries (“flat earth”, “new world order”, “qanon”, “9/11”, “illuminati”, “george soros”) across three locations (two in the US and one in the UK) and two observation periods (March and May 2021). We find that all search engines except Google consistently displayed conspiracy-promoting results and returned links to conspiracy-dedicated websites in their top results, although the share of such content varied across queries. Most conspiracy-promoting results came from social media and conspiracy-dedicated websites while conspiracy-debunking information was shared by scientific websites and, to a lesser extent, legacy media. The fact that these observations are consistent across different locations and time periods highlight the possibility of some search engines systematically prioritizing conspiracy-promoting content and, thus, amplifying their distribution in the online environments.
Here is the full paper by Aleksandra Urmana, Mykola Makhortykhb, Roberto Ulloac, and Juhi Kulshrestha. Of course it is also worth investigating which search engine does the most to “censor” true conspiracy theories. Are there any?
Via Aleksandra Urman.
I don’t believe this result, but it is at least worth a ponder:
Previous research has found significant impacts of daylight and views on the cognitive function of office workers. In this study, we use scores on decision-making performance to estimate the annual economic potential of optimizing daylighting and views in U.S. offices. Cognitive scores were compared against over 100,000 previous test scores to obtain the distributional shift in cognitive performance when working in an office with optimized daylighting and views as opposed to an office with traditional blinds. These changes in performance were then compared to compensation data from the Bureau of Labor Statistics. Office workers shifted on average from the 52nd percentile to 65th percentile, equivalent to a $11,809 difference in salary per person per year. When conservatively accounting for the number of employees working within 15 ft of a window with blinds, optimizing daylight and views in U.S. offices has the potential to generate $352B ($240B–$464B), or 1.7% of the 2018 U.S. gross domestic product (GDP), in additional productivity. These findings suggest that building developers, architects and tenants should give additional attention to daylight design and façade technology as they consider new building construction, renovation and leasing options.
This paper demonstrates that the measured stock of China’s holding of U.S. assets could be much higher than indicated by the U.S. net international investment position data due to unrecorded historical Chinese inflows into an increasingly popular global safe haven asset: U.S. residential real estate. We first use aggregate capital flows data to show that the increase in unrecorded capital inflows in the U.S. balance of payment accounts over the past decade is mainly linked to inflows from China into U.S. housing markets. Then, using a unique web traffic dataset that provides a direct measure of Chinese demand for U.S. housing at the zip code level, we estimate via a difference-in-difference matching framework that house prices in major U.S. cities that are highly exposed to demand from China have on average grown 7 percentage points faster than similar neighborhoods with low exposure over the period 2010-2016. These average excess price growth gaps co-move closely with macro-level measures of U.S. capital inflows from China, and tend to widen following periods of economic stress in China, suggesting that Chinese households view U.S. housing as a safe haven asset.
The prevalence of consanguineous marriage among the Arab population in Israel increased significantly from 36.3% to 41.6% in the decade from 2007 to 2017. First-cousin and closer marriages constituted about 50% of total consanguineous marriages in the two periods surveyed. Consanguinity was found to be significantly related to religion and place of residence. Thus, the prevalence of consanguineous marriage remains high among the Arab population in Israel, similar to other Arab societies.
From David McKenzie at the World Bank, here is one excerpt:
Data from big middle-income countries, and English-speaking Africa were most common, with no papers on the Middle East and North Africa, and very little study of the poorest places: In both samples, India, Brazil, and Colombia (and the U.S.!) were the most common countries studied, with a smattering of papers from East Asia, other South Asian countries, and Latin America, and one from Russia with nothing else on Eastern Europe and Central Asia. Of the World’s 25 poorest countries, only one (Mozambique) was the subject of study; of the five countries that contain half the World’s poor, there were papers on India and Bangladesh, but none on Nigeria, DRC or Ethiopia.
Here is another:
RCTs have far from overtaken development, difference-in-differences is the most popular identification method, yes, people still do IV, and no, no one does PSM on the job market: The pandemic may have reduced the ability of people to do some field experiments, but this year at least, only 20% of the top school sample, and only 6% of the World Bank sample were doing RCTs. More than one quarter in both cases were using difference-in-differences. RDD and IVs were used in about 10% of the papers each, and structural models were common in the World Bank sample (which has more trade and macro papers). None of the papers used propensity score matching.
The blog post is interesting throughout. Via the excellent Samir Varma.
The overall cost of living faced by low-income households (post-tax income <$50,000) in the most expensive city—San Jose, CA—is 49% higher than in the median commuting zone, Cleveland, and 99% higher than the most affordable commuting zone—Natchez, MS.
The three commuting zones with the lowest consumption of low income households are San Jose, CA; San Francisco, CA; and San Diego, CA, with consumption levels between 27% and 30% lower than the median commuting zone. At the other extreme of the spectrum, examples of commuting zones with high consumption of low-income households are Huntington, WV; Johnstown, PA; and Elizabeth City, NC, with consumption levels in real terms 22–23% higher than the median commuting zone. The range of consumption levels observed across U.S. communities is quite wide: Low-income families who live in the most affordable commuting zone enjoy a level of market-based consumption measured in real terms that is 74% higher that of families with the same income who live in the least affordable commuting zone.
The estimated coefficient implies that a middle-skill household moving from the median commuting zone (Cleveland) to the commuting zone with the highest price index (San Jose) would experience a 7.7% decline in their standard of living. Moving from the commuting zone with the lowest cost of living index (Natchez) to the commuting zone with the highest index would imply a decline in the standard of living by 12.7%.
As for high school dropouts:
Moving from Natchez to San Jose implies a 26.9% decline in the standard of living.
Here is the NBER working paper by Rebecca Diamond and Enrico Moretti
There is evidence that physicians disproportionately suffer from substance use disorder and mental health problems. It is not clear, however, whether these phenomena are causal. We use data on Dutch medical school applicants to examine the effects of becoming a physician on prescription drug use and the receipt of treatment from a mental health facility. Leveraging variation from lottery outcomes that determine admission into medical schools, we find that becoming a physician increases the use of antidepressants, opioids, anxiolytics, and sedatives, especially for female physicians. Among female applicants towards the bottom of the GPA distribution, becoming a physician increases the likelihood of receiving treatment from a mental health facility.
That is from a new NBER working paper by D. Mark Anderson, Ron Diris, Raymond Montizaan, and Daniel I. Rees. Is it personality type? Or the ease of opportunity? The stress of the job? Or something else?
We evaluate the performance of artificial intelligence (AI)-powered mutual funds. We find that these funds do not outperform the market per se. However, a comparison shows that AI-powered funds significantly outperform their human-managed peer funds. We further show that the outperformance of AI funds is attributable to their lower transaction cost, superior stock-picking capability, and reduced behavioral biases.
— Eric Topol (@EricTopol) December 4, 2021
And here is an update on patient profiles from South Africa: they don’t seem to have major oxygen problems.
The demographic transition –the move from a high fertility/high mortality regime into a low fertility/low mortality regime– is one of the most fundamental transformations that countries undertake. To study demographic transitions across time and space, we compile a data set of birth and death rates for 186 countries spanning more than 250 years. We document that (i) a demographic transition has been completed or is ongoing in nearly every country; (ii) the speed of transition has increased over time; and (iii) having more neighbors that have started the transition is associated with a higher probability of a country beginning its own transition. To account for these observations, we build a quantitative model in which parents choose child quantity and educational quality. Countries differ in geographic location, and improved production and medical technologies diffuse outward from Great Britain. Our framework replicates well the timing and increasing speed of transitions. It also produces a correlation between the speeds of fertility transition and increases in schooling similar to the one in the data.
This is all him, no double indent though:
“As a regular reader of your blog and one of the PIs of the Bangladesh Mask RCT (now in press at Science), I was surprised to see your claim that, “With more data transparency, it does not seem to be holding up very well”:
- The article you linked claims, in agreement with our study, that our intervention led to a roughly 10% reduction in symptomatic seropositivity (going from 12% to 41% of the population masked). Taking this estimate at face value, going from no one masked to everyone masked would imply a considerably larger effect. Additionally:
- We see a similar – but more precisely estimated – proportionate reduction in Covid symptoms [95% CI: 7-17%] (pre-registered), corresponding to ~1,500 individuals with Covid symptoms prevented
- We see larger proportionate drops in symptomatic seropositivity and Covid in villages where mask-use increased by more (not pre-registered), with the effect size roughly matching our main result
The naïve linear IV estimate would be a 33% reduction in Covid from universal masking. People underwhelmed by the absolute number of cases prevented need to ask, what did you expect if masks are as effective as the observational literature suggests? I see our results as on the low end of these estimates, and this is precisely what we powered the study to detect.
- Let’s distinguish between:
- The absolute reduction in raw consenting symptomatic seropositives (20 cases prevented)
- The absolute reduction in the proportion of consenting symptomatic seropositives (0.08 percentage points, or 105 cases prevented)
- The relative reduction in the proportion of consenting symptomatic seropositives (9.5% in cases)
Ben Recht advocates analyzing a) – the difference in means not controlling for population. This is not the specification we pre-registered, as it will have less power due to random fluctuations in population (and indeed, the difference in raw symptomatic seropositives overlooks the fact that the treatment population was larger – there are more people possibly ill!). Fixating on this specification in lieu of our pre-registered one (for which we powered the study) is reverse p-hacking.
RE: b) vs. c), we find a result of almost identical significance in a linear model, suggesting the same proportionate reduction if we divide the coefficient by the base rate. We believe the relative reduction in c) is more externally valid, as it is difficult to write down a structural pandemic model where masks lead to an absolute reduction in Covid regardless of the base rate (and the absolute number in b) is a function of the consent rate in our study).
- It is certainly true that survey response bias is a potential concern. We have repeatedly acknowledged this shortcoming of any real-world RCT evaluating masks (that respondents cannot be blinded). The direction of the bias is unclear — individuals might be more attuned to symptoms in the treatment group. We conduct many robustness checks in the paper. We have now obtained funding to replicate the entire study and collect blood spots from symptomatic and non-symptomatic individuals to partially mitigate this bias (we will still need to check for balance in blood consent rates with respect to observables, as we do in the current study).
- We do not say that surgical masks work better than cloth masks. What we say is that the evidence in favor of surgical masks is more robust. We find an effect on symptomatic seropositivity regardless of whether we drop or impute missing values for non-consenters, while the effect of cloth masks on symptomatic seropositivity depends on how we do this imputation. We find robust effects on symptoms for both types of masks.
I agree with you that our study identifies only the medium-term impact of our intervention, and there are critically important policy questions about the long-term equilibrium impact of masking, as well as how the costs and benefits scale for people of different ages and vaccination statuses.”
In a unique sample of 394 adoptive and biological families with offspring more than 30 years old, biometric modeling revealed significant evidence for genetic and nongenetic transmission from both parents for the majority of seven political-attitude phenotypes. We found the largest genetic effects for religiousness and social liberalism, whereas the largest influence of parental environment was seen for political orientation and egalitarianism. Together, these findings indicate that genes, environment, and the gene–environment correlation all contribute significantly to sociopolitical attitudes held in adulthood, and the etiology and development of those attitudes may be more important than ever in today’s rapidly changing sociopolitical landscape.
India’s most recent National Family Health Survey, which is conducted every five years by the Health Ministry, was released Wednesday and showed the total fertility rate (TFR) across India dropping to 2.0 in 2019-2021, compared with 2.2 in 2015-2016. A country with a TFR of 2.1, known as the replacement rate, would maintain a stable population over time; a lower TFR means the population would decrease in the absence of other factors, such as immigration…
In cities across India — as in other countries — women are opting for fewer children: The urban fertility rate is 1.6.
And the bias against baby girls is diminishing. Here is the full story. Via Naveen.
That is the topic of my latest Bloomberg column, here is one excerpt:
Maybe the men, on average, did have greater ambition and thus promotion potential. One reason could be that women, on average, spend more time at home raising children than men. For very demanding executive jobs, even a small difference in time and travel availability could make a big difference in job performance.
And yet even if that’s the case, there could still be a discrimination problem. Even if women and men differ on average, there is a probability distribution for each group, and those distributions usually will overlap. That is, there will be many women who are willing and able to meet any workplace standard thrown at them, and many men with limited ambition.
If you think men and women are different on average, the unfairness can become all the more severe for the potential top performers. In this context, employers will look at the most talented women and, for reasons of stereotyping, dramatically underestimate their potential, including for leadership positions.
Economic reasoning suggests another subtle effect at play. Promotion to the top involves a series of steps along a career ladder, often many steps. If there is a discrimination “tax” at each step, even if only a small one, those taxes can produce a discouraging effect. It resembles the old problem of the medieval river that has too many tolls on it, levied by too many independent principalities. The net effect can be to make the river too costly to traverse, even if each prince is taking only a small amount.
With a citation to Zaua further below!
- This report uses natural language processing to analyze the abstracts of successful grants from 1990 to 2020 in the seven fields of Biological Sciences, Computer & Information Science & Engineering, Education & Human Resources, Engineering, Geosciences, Mathematical & Physical Sciences, and Social, Behavioral & Economic Sciences.
- The frequency of documents containing highly politicized terms has been increasing consistently over the last three decades. As of 2020, 30.4% of all grants had one of the following politicized terms: “equity,” “diversity,” “inclusion,” “gender,” “marginalize,” “underrepresented,” or “disparity.” This is up from 2.9% in 1990. The most politicized field is Education & Human Resources (53.8% in 2020, up from 4.3% in 1990). The least are Mathematical & Physical Sciences (22.6%, up from 0.9%) and Computer & Information Science & Engineering (24.9%, up from 1.5%), although even they are significantly more politicized than any field was in 1990.
- At the same time, abstracts in most directorates have been becoming more similar to each other over time. This arguably shows that there is less diversity in the kinds of ideas that are getting funded. This effect is particularly strong in the last few years, but the trend is clear over the last three decades when a technique based on word similarity, rather than the matching of exact terms, is used.