As discussed in the problem statement, the proposed solution is to educate and raise awareness of this topic with the hopes that software and insurance companies review their algorithms and propose necessary changes for implementation. Additionally, the solution may ignite passion into younger audiences interested in finding solutions for this problem. They may carry forward by researching this topic further in their studies. Ultimately, the solution aims to increase awareness of the issue in order to generate changes that would help racialized groups access the medical attention they require. This solution was chosen because it has proved effective in the past. For example, there was a study conducted by Ziad Obermeyer, a health policy researcher at the University of California, Berkeley, and his colleagues, where they examined and discovered racial disparities in the results of one risk prediction program. The team proceeded to send their results to the company, the company was committed to correcting their model. It should be noted that there are limitations to this type of solution; there is research that shows that instruction and awareness is usually not all that is needed to change human behaviour. An article from the University of Utah’s health faculty that discusses why change in human behaviour is so difficult, specifically with regards to health, supports this limitation. For example, the article states that there are six stages of change, where only the first two concern the patient becoming aware of the issue. More importantly, the other stages are preparation, action, and maintenance. 1 Nevertheless, controversies such as racial biases in healthcare are very often uncomfortable for society to acknowledge. As such, this solution of raising awareness was chosen to be superior and realistic because it aims to gradually break down that barrier. To properly evaluate this topic, an email interview with a machine learning engineer specializing in the healthcare industry was conducted to gain primary research. The interviewee wishes to remain anonymous since they reveal information regarding their current work; thus, they will be referred to as “ML engineer” throughout. “Raising awareness” is a very broad term, and there are already many existing solutions that are similar to this one. As such, this particular solution will focus on covering some of the gaps in previous articles, as well as discussing the general problems found. Specifically, the two areas that will be considered are the reliability of the software, and the equality issues surrounding the public’s access to the software. The following text is a proposed article to educate readers of the biases surrounding predictive algorithms in healthcare: Solution: Article There are often flaws in predictive algorithms employed in North American healthcare systems. The aforementioned Optum incident in the problem statement is a measure of the inadequacy in these systems’ reliability and integrity, where reliability is defined by the operation of the software and the accuracy of the data, and integrity is defined as safeguarding the accuracy and completeness of stored data: “When the company replicated the analysis on a national data set of 3.7 million patients, they found that black patients who were ranked by the algorithm as equally as in need of extra care as white patients were much sicker: They collectively suffered from 48,772 additional chronic diseases.” 2 In this example, the seemingly correct data is lacking integrity because it under-represents the large, at-risk, minority 1 Call, M. (2020, February 14). Why is behavior change so hard? Retrieved November 02, 2021, from https://accelerate.uofuhealth.utah.edu/resilience/why-is-behavior-change-so-hard 2 Johnson, Carolyn Y. “Racial Bias in a Medical Algorithm Favors White Patients over Sicker Black Patients.” The Washington Post, WP Company, 25 Oct. 2019, www.washingtonpost.com/health/2019/10/24/racial-bias-medical-algorithm-favors-white-patients-over-sicker-black- patients/.
population that cannot afford healthcare. As a result, the system is not entirely reliable in identifying vulnerable patients. Unfortunately, the solution to fairness is not simple. ML engineer says, “It is however a sad mathematical fact that if we enforce that the model is \"fair\" then we will decrease the overall performance.” Furthermore, it is an open question in fair machine learning on what the correct choice is with respect to ethics. With that said, ML engineer remarks that these systems are nevertheless fairly accurate despite unavoidable flaws because “the deployed models would be subject to scrutiny that no doctors would ever be subjected to.” Access to technology is often associated with wealth. Unfortunately, this carries over into the healthcare industry and predictive analytics. In North America, the cost of insurance along with other sociodemographic characteristics are key factors that can cause a digital divide. When a group in the population has difficulty accessing the healthcare services required, predictive algorithms begin to produce biases due to a lack of data from the absent population. This poses a risk of inequality in access that hinders patients from receiving suitable healthcare. In the United States, residents either purchase healthcare insurance in order to receive healthcare services, or receive varying provision of insurance from the government (Medicaid). The division between patients with private insurance companies and patients with Medicaid can cause a difference in their abilities to access the database of hospitals and clinics, which in return produces a bias in predictive software systems. An analysis of the data from Boston University’s radio station (WBUR) shows that the statewide average of Medicaid patients in Massachusetts’ hospitals was 18%, while the majority of the patients had private insurance providers. Hospitals accept more of the latter because private insurance companies “pay more, often twice as much or more, than does Medicaid for the same appointment, test or procedure.” Hence, an increase of free care and Medicaid patients in a hospital could cost the hospital tens of millions of dollars. 3 For poorer patients that are unable to afford insurance from private companies, this often implies fewer visits to the hospital due to high medical costs. Alongside costly visits, this minority group may also have difficulty accessing healthcare services due to many different obstacles. For example, people forced to work night shifts in order to earn a decent income would struggle to find the time to acquire necessary medical services. Poverty could also translate into having less accessible transportation, and thus not being able to commute to medical appointments. Table B-5. Poverty Status of People by Family Relationship, Race, and Hispanic Origin: 1959 to 2019 of the United States Census Bureau shows that Black populations had the highest poverty rates (18.8% in 2019) compared to all other races, whereas White (9.1% in 2019) and Asian populations had the lowest rates. 4 This data suggests that Medicaid provides insurance for more Black patients than any other racial groups. Corresponding to the fewer hospital visits made by Blacks, this statistic would be reflected on their electronic health record (EHR) and as a result, predictive algorithms would replicate racial biases in the data caused by the unreliable reasoning that there is a direct correlation between the number of hospital visits and health score. This 3 Bebinger, Martha. “Inside Boston Hospitals, A Reckoning With Racism.” Inside Boston Hospitals, A Reckoning With Racism | CommonHealth, WBUR, 19 June 2020, www.wbur.org/commonhealth/2020/06/19/residents-petition-mass-general-brigham. 4 Semega, Jessica, et al. “Income and Poverty in the United States: 2019.” The United States Census Bureau, 15 Sept. 2020, www.census.gov/library/publications/2020/demo/p60-270.html.
consequently produces risk scores that under-represent actual average health conditions of Black populations. When broadly viewed, inaccurate results can have deep negative impacts on a society, often present through the form of feedback loops. For example, the allocation of hospitals in a community may be directed towards areas where medical services are predictably more required based on general higher insurance dividends. However, these areas are usually affluent neighborhoods with patients from a distinct racial group who are sufficiently wealthy to afford private insurance companies. This definitely raises concern regarding equality of access to IT in hospitals for poorer minority groups, which then returns to the initial problem of uninformed databases. A similar analysis can be made for Canadians, who have free healthcare with the exception of prescribed medicine. In most cases, a patient’s prescribed medicines are paid for by their private insurance company. Similar to the States, the highest poverty rate in Canada is held by Black people 5, indicating that Black Canadians have much lower access to private insurance companies than other racial groups. Furthermore, on average Black Canadians are hence more likely to have lower medicine costs recorded on their electronic health records compared to White Canadians. Through the previous logical structure applied to Black Americans, a racial bias could thus be produced from predictive algorithms if preventive factors are not embedded in the systems to reduce such disparities. Despite aforementioned risks, it should be kept in mind that artificial intelligence was created to reliably predict outcomes from other patterns in the data. AI and ML techniques allow data to be pulled from existing same-race patients to help make predictions about other related, incomplete health records. Therefore, it must be emphasized that predictive algorithms should only be viewed as resourceful data points that can support a clinician’s decision. Only under these circumstances can the software be considered a helpful resource for patients and staff alike. 5 “Data Tables, 2016 Census.” Visible Minority (15), Individual Low-Income Status (6), Low-Income Indicators (4), Generation Status (4), Age (6) and Sex (3) for the Population in Private Households of Canada, Provinces and Territories, Census Metropolitan Areas and Census Agglomerations, 2016 Census - 25% Sample Data, 17 June 2019, www12.statcan.gc.ca/census-recensement/2016/dp-pd/dt-td/Rp-eng.cfm?TABID=2&Lang=E&APATH=3 &DETAIL=0&DIM=0&FL=A&FREE=0&GC=0&GID=1341679&GK=0& GRP=1&PID=110563&PRID=10&PTYPE=109445&S=0&SHOWALL=0&SUB=0& amp;Temporal=2017&THEME=120&VID=0&VNAMEE=&VNAMEF=&D1=0&D2 =0&D3=0&D4=0&D5=0&D6=0.
Appendix Email interview Date: October 23, 2021 Interviewee: A machine learning engineer specializing in the healthcare industry, wishes to remain anonymous through the name “ML engineer” The following text constitutes the questions emailed to ML engineer (italics) and their response (regular). 1. What are some widely used predictive healthcare algorithms (regarding risk scores) in Canadian hospitals and clinics? 2. How reliable do you think they are in terms of being able to accurately predict risk scores? 3. Do these systems often have errors? If so, what type of errors exist and what is the frequency of these errors occurring? 4. How does the technology work? 5. Factors considered to develop these algorithms? (eg. patient’s gender, history of medical costs, etc.) 6. What type of patient data is considered input data? 7. How many prototypes/edits/editors does this type of software have to go through before it’s finalized for public use? 8. When developing the software, how do you determine what moral basis to lean on in order for the software to give fair results? I work mostly on things that are non-diagnostics related. My only exposure to patient scores is through some of my research work (Multiple Sclerosis Severity Classification From Clinical Text at the 3rd Clinical Natural Language Processing Workshop at EMNLP 2020), which is focused on a single disease at a single Canadian Clinic. I am not aware of any widely use predictive algorithm that is deployed in a Canadian hospital, and I know that the hurdles to development and deployment are significant. Most of the deployed algorithms deal with automating things that don't affect patient outcomes, as those usually don't have the same regulatory hurdles. My research work was in that space - Trying to identify what is the severity of Multiple Sclerosis from encounter notes taken by clinicians. This enables people who want to conduct research / analyze data to avoid manually labeling thousands of patient encounters, but rather can use an algorithm to do it for them. The regulatory hurdles for deployment of an algorithm that has direct effect on patient care are still a bit \"fuzzy\", I have never seen a document outlining all considerations that would go into approval in Canada, but I know that they require a similar process as new drugs (FDA trials) in the US, and I would expect something similar here. The general trend in clinical machine learning is similar to that in driverless cars, where the algorithm needs to be significantly safer and more effective than a human, not just improve by a bit, to even be considered for practical deployment. The most common way people consider to get around the hurdles is to serve the algorithm as an additional datapoint to a clinician, and leave the decision to them, but I'm not sure how that works in practice with our regulations here in Canada. As far as errors go, machine learning models have a tendency to amplify bias already present in data, and there is an entire field of research around assessing and removing biases. The deployed models would be subject to scrutiny that no doctors would ever be subjected to.
For research and development, there is a clear lack of publicly available clinical data on which algorithms can get assessed (MIMIC III is the only dataset I am aware of), which might lead to problems, but those would be addressed before actual deployment through a thorough assessment on the data at the hospital. However this is a big limitation for us as the research process often relies on this single data source. Usually the fairness assessment is done through evaluating model performance on sub-populations (i.e. is the model as accurate on black people as on white people etc.) It is however a sad mathematical fact that if we enforce that the model is \"fair\" then we will decrease the overall performance. It is an open question in Fair Machine Learning on what is the \"correct\" or even \"acceptable\" choice with respect to ethics.
Search
Read the Text Version
- 1 - 5
Pages: