Abstract

The last year has seen artificial intelligence (AI) in many manifestations rise to the attention of the public at large. Most of the attention has been focused on Chat GPT and its cousins, which are able to generate nearly normal text from input information, despite the near ubiquitous use in pattern recognition and predictive algorithms. These platforms are not only dependent on extensive programming but require massive data sources and substantial processing capacity.
It is noteworthy that AI applied to natural language models, such as ChatGPT and Bard, has garnered greater attention, largely because it has entered the realm of potential daily utility, in a manner that threatens the reader. This threat to the reader is somewhat existential: a fear that the source of the information is a computer, rather than a human. Ultimately, the basis of these AI platforms is both programmed and human-derived data.
Much of the discomfort about AI is in fact the mystic of the process. AI and their homologs of machine-learning, deep-learning and neural-networks are nothing more than capacity for unfettered compute power applied to either finite data sets for learning or near-infinite data sets for generation of text and presumably information.
AI is simply analytic extension of data collection and analysis. Functionally, this can be traced to the first census 1 and tax system, 2 both of which evolved under the Egyptian Pharaohs. Enumeration of the population, what they do, and what they produce, used to develop a scale of value, and subsequently determine a sum (tax) to be given to the rulers. Since the advent of cultures, the challenge has been the collection of the data, and the analysis of the data. Although the applications for data collection have changed dramatically, and the means of analysis has evolved concomitantly, the process is the same for data: information and knowledge. 3
In the context of science, the equilibrium of data collection and data analysis has tilted back and forth over the last two decades. The introduction of digital imaging, including the fields of astronomy, geology, radiology, and histopathology as well as others, resulted in an excess of data for which the analytic capacity was limited. This was counter-balanced by the advent of faster computers, enabling faster and more complex data analysis. This back and forth of data acquisition and analytic capacity continued for two decades. However, with the wide-spread access of cloud computing, which concurrently saw the cost of storage drop dramatically, resulted in substantial progress in the field of image analysis.
It is the advent of cloud computing that has truly seen AI enter daily use. However, most of us never appreciate that global positioning system (GPS) route planning, now with adjustment for traffic, is in fact basically AI. This daily exemplar highlights the positives and negatives of AI. Anyone who has used daily route planning will confess they are trying to “beat the system”—and it is exactly these efforts to “beat the system” that are illustrative. GPS route planning is based on a fundamental data set of roads that connect point A to point B. This data set is augmented (annotated) with details concerning speed limits, road types, tolls, and other roadway restrictions, which was the bases of GPS route planning. This is now augmented, in (near) real time with data obtained from users (cell phone GPS data) to generate a functional circuit map of all the options for travel from point A to point B. The success of the prediction is entirely predicated on the volume of information. Whereas route planning in a busy urban area at rush hour provides a near saturation of information and accurate data on both the common routes, but on occasion, the odd connectors that are not obvious, the same technology can functionally fail during winter storms on rural roads, where the input is too limited to ensure an accurate model can be generated.
We happily depend on AI-provided route planning but are suspect of AI-driven text. The fundamentals are entirely the same—the greater the saturation of data for the generation of information, the better the quality of the product, but in the setting of sparse data, determining the boundaries of data that can be incorporated into the model is challenging, both of humans and AI. Real-world examples are the limits of weather forecasting across complex geography of micro-climates, for which data are not available. The failures of natural language processor AI is similar to the imaginations of young children who will insert content from the imagination, or confabulation of the cognitively challenged elderly who lack the recall of facts.
AI is a very powerful tool in an ever-growing toolbox. AI actively uses a near-infinite, but often poorly curated data set of the content of the free, and nearly instant content of the internet, but lacks the capacity to effectively fact check the data. The net effect is that the “Cloud” of freely available data has replaced our most powerful repositories of the past—libraries. The fundamentals of the analytics are largely unchanged, rather the breadth of data available and depth of analytic capacity have increased in the setting of cloud computer, which ultimately has been the enabling element of AI.
As a scientist and editor, I encourage authors to openly use and acknowledge the power of AI and Chat-tools. With every tool and technique, it is the responsibility of the author to evaluate the performance of the tool (and apply controls when appropriate), as well as acknowledge and reference its use. At the most granular level, if Chat-tools help authors communicate in the “language of science” more effectively, AI is an effective tool that benefits to both authors and readers. However, the challenge of AI is that it requires supervision and review to ensure accuracy.
