Sage Journals: Discover world-class research

Abstract

Psychometric scales are foundational to market research, yet their development is traditionally resource-intensive, slow, and prone to cognitive and cultural bias. Item generation, the stage where abstract constructs are translated into candidate measures, is especially demanding. This paper introduces a methodological framework for integrating generative artificial intelligence (AI) into item generation, illustrated through a proof-of-concept study on the entrepreneurial mindset. We outline a structured human–AI workflow that combines large language model outputs with critical researcher oversight at key decision points. The process demonstrates how AI can rapidly expand item pools, reduce redundancy, and surface alternative phrasings, while expert validation safeguards theoretical fidelity and conceptual clarity. Our framework is both rigorous and adaptable. It provides detailed replication guidance, highlights ethical and practical considerations, and identifies how researchers can transparently document AI use. By showing how AI-assisted item generation can enhance efficiency without compromising validity, this paper rethinks scale development methods for an era where market research must be both conceptually robust and responsive to fast-changing contexts.

Keywords

generative AI psychometric scale development market research methodologies human-AI collaboration entrepreneurial mindset

Introduction

Psychometric scales, once authored exclusively by researchers, can now be co-developed with generative AI. Generative artificial intelligence (AI), particularly large language models (LLMs), is reshaping how knowledge is produced and validated across disciplines. For management and marketing scholars, the implications are transformative. Psychometric scales have long been recognised as essential tools for measuring latent constructs in marketing and management research (Gerbing & Anderson, 1988; Morgado et al., 2018; Nunnally & Bernstein, 1994). Today, these instruments can also be developed using generative algorithms to augment traditional researcher-led approaches.

Psychometric scale development has always been central to advancing theory, however it remains slow, resource-intensive, and vulnerable to conceptual and methodological biases (DeVellis & Thorpe, 2016; Hinkin, 1998; Morgado et al., 2018). Traditional methods safeguard rigour, but at the cost of agility (DeVellis & Thorpe, 2016). In contemporary market research, where consumer preferences and cultural trends evolve rapidly, these constraints sharpen the longstanding paradox between conceptual robustness and adaptability. Recent studies indicate that generative AI offers tools to navigate, though not eliminate, this challenge (DeVellis & Thorpe, 2016; Grassini, 2023; Hoffmann et al., 2024).

Generative AI brings this methodological tension into sharper focus. LLMs can rapidly produce diverse and theoretically anchored item pools, refine phrasing for clarity, and detect redundancies (Beghetto et al., 2025; Hoffmann et al., 2024; Russell-Lasalandra et al., 2024). Early evidence suggests that AI-generated items approach the quality of human-authored ones (e.g. Görgülü et al., 2025), with development timelines reduced from weeks to days. However, these gains risk being offset by challenges of transparency, inclusivity, and researcher agency.

Despite growing interest in AI for research methods, detailed methodological guidance remains scarce (Behrend & Landers, 2025; Grassini, 2023; Hoffmann et al., 2024). While some studies outline AI’s potential across the entire scale development process, the item generation stage, widely recognised as the most cognitively demanding and consequential for translating theory into measurement (Hinkin, 1998; Morgado et al., 2018), remains underexplored. This paper does not seek to publish a fully validated scale; rather, it offers a methodological deep dive into AI-assisted item generation, illustrated through the entrepreneurial mindset as a proof-of-concept. Full validation analyses, including factor structure, reliability, and cross-cultural replication, will be reported separately.

This paper makes both a practical and a theoretical contribution. Practically, it offers researchers a replicable workflow for embedding AI responsibly into item generation. Theoretically, it reconceptualises item generation, not as a traditionally opaque process reliant on expert judgement, but as one that can be rendered transparent, auditable, and open to scrutiny. Together, these contributions highlight how psychometrics may evolve in an AI era, where efficiency gains are balanced with the preservation of conceptual integrity.

Psychometric Scale Development and AI in Market Research

Psychometric scales are indispensable in market research, providing the foundation for measuring latent constructs (Gerbing & Anderson, 1988; Nunnally & Bernstein, 1994), such as consumer trust (Morgan & Hunt, 1994), brand loyalty (Chaudhuri & Holbrook, 2001), customer engagement (Hollebeek et al., 2014), and service quality (Parasuraman et al., 1988). By rendering complex psychological and behavioural phenomena measurable, scales enable researchers to generate insights that inform marketing strategies and organisational decision-making.

The traditional scale development process is rooted in robust, well-established frameworks that emphasise conceptual clarity, methodological rigour, and transparency. Seminal contributions outline the critical stages of psychometric robustness, including construct conceptualisation, item generation, scale refinement, and validation (e.g. Churchill Jr., 1979; Clark & Watson, 1995; Dawis, 1987; Hinkin, 1998; Netemeyer et al., 2003; Nunnally & Bernstein, 1994). Although differences exist across these frameworks, they converge on seven core stages of development, summarised in Table 1, and remain the methodological standard across psychology, management, and marketing research.

Table 1.

Comparison of Traditional and AI-Assisted Psychometric Scale Development

Stage	Traditional psychometric scale development	AI-assisted tools in psychometric scale development
1. Conceptualization	Defines the construct, dimensions, and theoretical underpinnings. Involves literature review to clarify distinctions and ensure alignment (Churchill Jr., 1979; Loevinger, 1957; Nunnally & Bernstein, 1994)	AI assists in literature review, identifying emerging constructs and dimensions more quickly, and supports generative approaches to conceptualization (Beltagy et al., 2019; Chen et al., 2021; Hoffmann et al., 2024)
2. Item generation	Creates a pool using deductive (theory-based) or inductive (qualitative data) methods, followed by expert review for clarity and redundancy (Churchill Jr., 1979; Hinkin, 1998; Morgado et al., 2018)	AI tools automate item generation by analysing large datasets, creating diverse, relevant items quickly and iteratively improving item quality (Grassini, 2023; Hoffmann et al., 2024; Rossi et al., 2024)
3. Item reduction	Refines items using exploratory factor analysis (EFA) to identify the underlying structure, removing items with poor consistency or low loadings. Item-total correlations and Cronbach’s alpha ensure reliability (Hinkin, 1998; Netemeyer et al., 2003; Nunnally & Bernstein, 1994)	AI flags redundant, unclear, or low-performing items in real time, optimizing selection for clarity and relevance. AI-enhanced models improve item reduction efficiency (Gao et al., 2023; Hoffmann et al., 2024)
4. Data collection	Gathers empirical data via surveys or interviews, ensuring a representative sample for generalizability. Sample representativeness ensures reliable analysis, such as EFA and CFA redundancy (Churchill Jr., 1979; DeVellis & Thorpe, 2016; Morgado et al., 2018)	AI automates data collection through adaptive testing, dynamically adjusting to participant responses and enhancing engagement. It improves sampling efficiency and personalizes the participant experience (Hoffmann et al., 2024)
5. Factor analysis	Validates the scale structure through EFA and CFA. EFA identifies constructs, while CFA confirms structure using fit indices (DeVellis & Thorpe, 2016; Hinkin, 1998; Netemeyer et al., 2003)	AI automates EFA and CFA, providing real-time updates to factor models as data accumulates. AI accelerates the validation process by adjusting models dynamically (Chen et al., 2021; Grassini, 2023; Hoffmann et al., 2024)
6. Reliability & validity testing	Assesses reliability (e.g., Cronbach alpha) and validity (convergent, discriminant, and predictive) to ensure scale precision (DeVellis & Thorpe, 2016; Hinkin, 1998; Netemeyer et al., 2003)	AI continuously tests reliability and validity, using automated methods for real-time assessment and detection of inconsistencies. Generative AI can evaluate multiple validity types in parallel (Hoffmann et al., 2024; Macdonald et al., 2023; Zhuo et al., 2023)
7. Replication	Confirms the scales robustness by testing with new samples to ensure generalizability and stability of the factor structure (DeVellis & Thorpe, 2016; Hinkin, 1998; Morgado et al., 2018)	AI simulates new populations and conditions to validate scale robustness, significantly reducing the need for extensive manual replication (Dasigi et al., 2021; Hoffmann et al., 2024; Macdonald et al., 2023)

These frameworks were conceived in a slower-paced research environment, where constructs such as brand loyalty or service quality were relatively stable, and multi-year scale development projects were feasible. In contemporary market research, however, consumer preferences, digital platforms, and cultural trends shift rapidly. Traditional processes, with their cycles of literature review, expert evaluation, and repeated empirical testing, are increasingly out of step with practice. They are resource-intensive, slow, and vulnerable to incomplete reporting, especially at the item generation stage, where methodological opacity often limits replication and adaptation across studies (DeVellis & Thorpe, 2016; Morgado et al., 2018). At the same time, construct definitions and item wording are shaped by human cognitive and cultural biases, which can compromise clarity, inclusivity, and reliability (Hinkin, 1998; Morgado et al., 2018).

This creates a paradox at the heart of psychometric research. The very rigour that ensures validity often undermines timeliness, while the responsiveness required in practice threatens conceptual precision. For market researchers facing globalised, digitised, and rapidly shifting consumer environments, the challenge is not simply to accelerate scale development but to reimagine its methodological foundations.

Generative AI enters precisely at this juncture. By streamlining item generation and expanding conceptual coverage, it offers tools to manage, though not eliminate, the tension between methodological rigour and practical responsiveness (Beghetto et al., 2025; Hoffmann et al., 2024; Russell-Lasalandra et al., 2024). Its role is not merely to accelerate existing methods. If candidate items are produced by machines rather than human theorists, questions of validity, originality, and researcher agency must be reopened. Equally, transparency becomes critical. Researchers need to show how prompts, models, and iterations influence the resulting item sets, so that processes remain auditable and replicable. The opportunity, therefore, is not only to improve efficiency but also to reconsider how constructs are operationalised, validated, and adapted in an AI era.

Transforming Scale Development with AI

Table 1 juxtaposes the seven traditional stages of scale development with the ways AI-driven methodologies diverge from them. At each stage, AI promises not only to reduce time and resource requirements but also to challenge assumptions about how theoretical constructs are operationalised. Tools such as OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini, and Meta’s LLaMA now leverage neural networks and natural language processing (NLP) to automate tasks once seen as the exclusive domain of expert judgement, including literature synthesis, item generation, and refinement (Beghetto et al., 2025; Grassini, 2023; Hoffmann et al., 2024; Russell-Lasalandra et al., 2024).

The impact begins with conceptualisation. Traditionally, this stage depends on manual literature reviews and expert consensus to define and structure constructs (Churchill Jr., 1979; Loevinger, 1957; Nunnally & Bernstein, 1994). Recent AI-enabled approaches extend this capacity by scanning large corpora to identify emerging constructs, map theoretical clusters, and flag contradictions (Grassini, 2023; Hoffmann et al., 2024; Russell-Lasalandra et al., 2024). Dedicated research assistants such as Elicit, Consensus, and ResearchRabbit further enable thematic clustering and cross-study comparisons, opening possibilities for more comprehensive and transparent conceptualisation. This shifts the epistemic balance. Conceptual clarity is no longer solely the product of expert synthesis but becomes partially machine-mediated, raising questions about authority, originality, and researcher agency.

Currently, AI’s most significant contribution lies in item generation, historically the most resource-intensive and opaque stage. Traditional approaches combine deductive and inductive logics, producing items from theory and qualitative data before subjecting them to expert review (Churchill Jr., 1979; Hinkin, 1998; Morgado et al., 2018). Cognitive limitations often result in redundancy and uneven construct coverage, and reporting of this stage is often incomplete, limiting transparency and replication across studies (DeVellis & Thorpe, 2016; Morgado et al., 2018). Generative AI, by contrast, can rapidly produce large, diverse, and theoretically anchored item pools (Beghetto et al., 2025; Russell-Lasalandra et al., 2024). It can refine language for clarity, flag redundancies, anticipate response patterns, and support translation across cultural contexts (Grassini, 2023). This automation broadens both the scale and inclusivity of item generation, but it also shifts the researcher’s role from originator to curator. Validity becomes a hybrid achievement of human–AI collaboration, demanding new norms of disclosure about prompt design, model selection, and filtering procedures.

Beyond item generation, AI also has potential to reshape validation and analysis. At the questionnaire design stage, adaptive logic can tailor question paths to respondents, reducing fatigue and enhancing data quality (Clickup, 2024; Hoffmann et al., 2024; infodesk, 2024). Synthetic datasets are already being used in market research practice to pre-test scales and identify weaknesses before field deployment (e.g., NielsenIQ, 2024; Parmar, 2024), suggesting a methodological opportunity for academic psychometrics. During administration, AI-enabled platforms dynamically manage recruitment and stratification, ensuring representation and higher response rates (e.g., SurveySensum, 2024). In analysis, AI can support statistical routines such as factor analysis and reliability testing, offering efficiency gains while raising questions of transparency and researcher agency (Beghetto et al., 2025; Hoffmann et al., 2024; Russell-Lasalandra et al., 2024). The result is not simply a faster pipeline but a methodological environment where simulation, automation, and adaptation become integral to scale development practice.

Taken together, these innovations point to a paradigm shift. Traditional frameworks privileged rigour and transparency, often at the cost of speed. AI introduces the prospect of combining rigour with agility, but it also embeds new dependencies, including opacity of algorithms, risks of homogenisation, and the amplification of bias. For psychometricians and marketing researchers alike, AI should not be viewed merely as a technical supplement but as a catalyst for rethinking what it means to build, test, and validate scales. The ethical and practical consequences of this transformation are considered later in this paper, where we outline safeguards for responsible adoption.

Methodological Framework for AI-Assisted Item Generation

Item generation has long been regarded as the most cognitively demanding stage of psychometric scale development. It requires balancing theoretical precision with linguistic nuance, ensuring construct coverage without redundancy, and anticipating how items will be interpreted across diverse respondents and contexts. Traditional methods rely heavily on human expertise, iterative consensus-building, and qualitative input. While robust, these approaches are resource-intensive, shaped by researcher bias, and often under-reported in ways that undermine replication and adaptation across studies (Hinkin, 1998; Morgado et al., 2018).

Generative AI alters this methodological terrain. Its ability to rapidly generate large, diverse, and theoretically anchored item sets allows researchers to break through long-standing constraints of time, cost, and cognitive limits (Beghetto et al., 2025; Hoffmann et al., 2024; Russell-Lasalandra et al., 2024). AI does not simply accelerate an existing process. It redefines the locus of creativity, shifting the researcher’s role from sole author to collaborative curator of machine-generated possibilities, consistent with typologies that frame LLMs as research assistants rather than autonomous authors (Behrend & Landers, 2025). This reframing raises both opportunities for rigour, by expanding item pools, surfacing overlooked dimensions, and reducing blind spots, and risks of opacity, homogenisation, or misplaced trust in algorithmic outputs. Any framework for AI-assisted item generation must therefore grapple explicitly with this methodological tension, namely the simultaneous pursuit of efficiency and conceptual integrity (Grassini, 2023). To address this challenge, we propose a five-component framework for embedding generative AI into scale development while maintaining transparency and replicability.

1. Establishing conceptual foundations: Item generation must remain anchored in clear theoretical definitions and construct boundaries. AI prompts should be theory-driven rather than ad hoc, ensuring that breadth does not come at the expense of construct validity.

2. Outlining a step-by-step workflow: Researchers should specify how AI outputs are generated, filtered, and refined through human oversight. Documenting iterations and decision rules creates an auditable trail and mitigates the opacity that often surrounds both human and machine processes.

3. Specifying prompts and parameters: Transparency requires that the precise inputs, model type, temperature settings, prompt structures, are disclosed. Such reporting reduces hidden variation, enhances replicability, and allows future researchers to evaluate how modelling choices influenced item outputs.

4. Comparing AI-assisted and traditional approaches: To assess methodological value, AI-generated items should be systematically evaluated against those produced through conventional methods. This includes coverage of construct dimensions, redundancy levels, and downstream psychometric quality.

5. Addressing ethical and practical considerations: Responsible adoption requires explicit attention to risks of bias, opacity, and over-reliance. Safeguards include cross-validation across models, cultural adaptation, disclosure of AI involvement, and training researchers to critically interrogate outputs.

Although illustrated here through the entrepreneurial mindset scale as a proof-of-concept, the framework is designed for broader application. Constructs central to market research, including customer trust, brand loyalty, and service quality, can be operationalised through this workflow. In this way, the framework does not merely automate item generation; it offers a replicable, transferable model that demonstrates how psychometric practice itself is evolving in the age of AI.

Conceptual Foundations

Establishing robust conceptual foundations is the first step in psychometric scale development. For this proof-of-concept, we selected the entrepreneurial mindset, a construct widely invoked across entrepreneurship, management, and education but inconsistently defined (Kuratko et al., 2020; Naumann, 2017; Zappe, 2018). Its fragmented conceptualisation illustrates a broader challenge in market research, where core constructs such as consumer trust or brand engagement are similarly diffuse, contested, and operationalised in divergent ways. Demonstrating how generative AI can advance conceptual alignment in such contexts underscores its broader methodological significance.

To ground the construct, we conducted a systematic literature review following established protocols (e.g., Denyer & Tranfield, 2009; Macpherson & Jones, 2010; Tranfield et al., 2003). Searches across EBSCO Host, ProQuest, and Scopus identified 2,370 publications, of which 427 peer-reviewed articles were retained after screening (see Figure 1). Using NVivo, we coded definitions and conceptualisations of entrepreneurial mindset, extending Hruby’s et al. (2016) thematic analysis of global mindset definitions and related frameworks (Davis et al., 2016; Kuratko et al., 2020; London et al., 2018; Naumann, 2017; Robinson & Gough, 2020; Shekhar et al., 2019). Four recurrent themes emerged: cognition, competence (knowledge, skills, and attributes), personality, and predisposition (attitudes and behaviours).

Figure 1.

Summary of the Systematic Literature Review Methodology (Adapted From Zahoor et al., 2020)

A critical insight from this review is that the entrepreneurial mindset has deep but underexplored roots in cognitive psychology. Early research by the Würzburg School emphasised how tasks (“Aufgaben”) activate cognitive processes that orient individuals toward particular responses (French II, 2016; Gibson, 1941; Gollwitzer, 1990, 2012; Mathisen & Arnulf, 2014). Despite this lineage, contemporary treatments often neglect cognition, resulting in conceptual drift (Buchanan & Kern, 2017; Mathisen & Arnulf, 2014). Re-grounding the entrepreneurial mindset in cognitive psychology not only strengthens its theoretical coherence but also parallels the development of widely used marketing constructs (e.g., brand attitudes, service quality), which are similarly anchored in cognitive–behavioural theory.

Although this review was conducted manually, it revealed the desirability of AI-enabled conceptual synthesis, a capability that has only become practical with the emergence of widely accessible large language models and research-assistant platforms. Recent advances, including SciBERT, Longformer, and ResearchRabbit, enable large-scale literature scans, clustering of related definitions, and preliminary synthesis (Beltagy et al., 2019; Cachola et al., 2020; Hoffmann et al., 2024). At the time of this study, these tools were emerging, so generative AI was deployed at the definition synthesis stage. Using ChatGPT and Bard (now Gemini), we amalgamated multiple definitions into candidate formulations. Human researchers retained theoretical fidelity by curating outputs, discarding those that conflated entrepreneurial mindset with adjacent constructs such as entrepreneurial orientation. An illustration of the prompts used for definition refinement is provided in Table 2.

Table 2.

ChatGPT Prompts

Concepts	Description in prompt
Mindset definition	Mindset is “a mental frame or lens that selectively organises and encodes information, thereby orienting an individual toward a unique way of understanding an experience and guiding one toward corresponding actions and responses” (Crum et al., 2013, p. 717)
Summary of mindset research	Mindset research can be summarised as
	1. As human beings, we are limited in our ability to absorb and process information. Thus, we are constantly challenged by the complexity, ambiguity, and dynamism of the information environment around us
	2. We address this challenge through a process of filtration. We are selective in what we absorb and biased in how we interpret it. The term mindset refers to these cognitive filters
	Our mindsets are a product of our histories and evolve through an interactive process. Our current mindset guides the collection and interpretation of new information. To the extent that this new information is consistent with the current mindset, it reinforces that mindset. From time to time, however, new information appears that is truly novel and inconsistent with the existing mindset. When this happens, we either reject the new information or change our mindset. The likelihood that our mindsets will undergo a change depends largely on how explicitly self-conscious we are of our current mindsets: the more hidden and subconscious our cognitive filters, the greater the likelihood of rigidity” (Gupta & Govindarajan, 2002, pp. 116–117)
Definition of entrepreneurial	Entrepreneurial is acting upon opportunities and ideas and transforming them into value for others. The value that is created can be financial, cultural, and social (FFE-YE, 2012)

The resulting definition was:

An entrepreneurial mindset refers to a mental framework or lens through which an individual selectively perceives and interprets information, guiding their understanding, actions, and responses in the context of entrepreneurship. It is shaped by past experiences and evolves through interaction. The mindset filters and biases information processing, reinforcing existing beliefs and behaviours, but remains open to change and adaptation when confronted with inconsistent information. Individuals with such a mindset are self-aware of their cognitive filters, enabling them to navigate complexity, ambiguity, and dynamism in entrepreneurial contexts.

This definition anchored subsequent item generation and highlights a broader methodological insight. AI-assisted synthesis can clarify contested constructs by generating candidate formulations that are then refined through human expertise. In market research, where constructs such as consumer trust and brand engagement are equally fragmented, such a workflow offers a replicable pathway toward conceptual consolidation.

Step-by-Step Workflow for AI-Assisted Item Generation

Item generation is one of the most consequential stages in psychometric development because the breadth, clarity, and theoretical fidelity of the initial pool determine all subsequent testing (Churchill Jr., 1979; Hinkin, 1998; Morgado et al., 2018). Traditional approaches combine deductive theory and inductive qualitative insights but are resource-intensive and constrained by human cognition, often yielding pools that are narrow, redundant, or biased (DeVellis & Thorpe, 2016; Netemeyer et al., 2003).

Generative AI can address some of these limitations by producing semantically varied item pools at scale and reducing researcher workload. However, efficiency alone does not guarantee rigour (Beghetto et al., 2025; Görgülü et al., 2025; Russell-Lasalandra et al., 2024). Without structured oversight, AI-generated items risk conceptual drift, redundancy, or the embedding of hidden biases. To mitigate these risks, we developed a transparent and replicable 14-step workflow that integrates automation with expert judgement. The workflow is summarised in Table 3 and described in detail in Appendix A.

Table 3.

AI-Supported Item Generation Process

Step	Description
Step 1 - priming the AI model	The AI model was primed with definitions of the target construct, its key dimensions, and related terminology to establish a theoretical foundation for item generation
Step 2 - context setting	The AI model was informed that the objective was to develop a psychometric scale for the target construct, guiding its output to align with methodological requirements
Step 3 - initial item generation	The AI model generated 10 statements for each of the construct’s dimensions, avoiding direct references to the construct to maintain conceptual clarity
Step 4 - selection of semantically similar items	The AI model selected five statements for each dimension that were most semantically similar while ensuring coverage of the dimension’s breadth
Step 5 - reverse-coding items	One item from each dimension was randomly selected and reverse-coded to introduce response variability and mitigate response bias
Step 6 - randomisation of items	Items were randomised to prevent order effects from influencing respondents’ interpretation and responses
Step 7 - item review and refinement	A grammar and clarity review was conducted to refine the items for grammatical accuracy, conceptual clarity, and simplicity
Step 8 - verification with a secondary AI model	The randomised items were input into a secondary AI model, which inferred the construct being measured and provided feedback on alignment with the intended dimensions
Step 9 - construct validation	The secondary AI model confirmed that the items aligned with the target construct and suggested potential enhancements for conceptual robustness
Step 10 - categorisation of items	The secondary AI model categorised the items into the construct’s dimensions, aligning them with their respective categories
Step 11 - identification of ambiguities	Items that could be confused across dimensions were flagged for review. Revisions included rewording for clarity or substitution from the original item pool
Step 12 - item randomisation	The finalised item list was randomised, and the AI model identified items least representative of the construct, often identifying reverse-coded items as expected
Step 13 – persona-based validation	The AI model assumed the persona of a target respondent to rate the items, providing justifications for responses. This process was repeated with multiple personas to ensure conceptual alignment
Step 14 - final categorisation check	The final list of items was re-input into the AI model, categorising them into the construct’s dimensions and validating alignment with the intended theoretical structure

The process began with construct priming, where AI models were provided with the theoretical definition and six cognitive dimensions, cognitive adaptability, heuristics, goal orientation, metacognition, beliefs, and entrepreneurial alertness, derived from our literature review. From this foundation, generative AI produced item pools, which were then subjected to systematic filtering and refinement for clarity, psychometric suitability, and neutrality. Illustrative outputs are presented in Table 4.

Table 4.

Dimensions and Example Items for the Entrepreneurial Mindset Scale

Dimensions	Definition	Example of item
Cognitive adaptability	We define cognitive adaptability as the ability to effectively and appropriately change decision policies (i.e., to learn) given feedback (inputs) from the environmental context in which cognitive processing is embedded (Haynie & Shepherd, 2009, p. 695)	I am reluctant to change my thinking when faced with new information or ways of doing things
Heuristics and biases	Biases and heuristics are decision rules, cognitive mechanisms, and subjective opinions people use to assist in making decisions. They simplify strategies that individuals use to make decisions, especially in uncertain and complex conditions	I draw on my past experiences to guide my decision-making
Goal orientation	Goal orientation differs between goal setting (deliberative) and goal striving (implemental). The deliberative stage is impartial and open-minded. Individuals evaluate the positive and negative effects of a decision to be taken and the desired goal regarding its feasibility and desirability. At this stage, goals are set. In the implemental stage, individuals strive towards goal achievement and process information related to where, how and when the goal is implemented. The decision is made, and they act on it (Mathisen & Arnulf, 2013)	I set ambitious goals for myself
Metacognition	“Metacognition describes a higher-order, cognitive process that serves to organize what individuals know and recognize about themselves, tasks, situations, and their environments in order to promote effective and adaptable cognitive functioning in the face of feedback from complex and dynamic environments (Brown, 1987; Flavell, 1979, 1987; Haynie & Shepherd, 2009, p. 696)	I am aware of how I think and how I make decisions
Beliefs	Deeply held strong assumptions that underpin our sensemaking and our decision-making (Krueger, 2007, p. 124)	I can see the positive aspects of every situation
Entrepreneurial alertness	Kirzner (1979) defined alertness as an individual’s ability to identify opportunities others overlook. In further developing the boundaries of alertness, we argue that an important component of alertness is the aspect of judgment, which focuses on evaluating new changes, shifts, and information and deciding if they would reflect a business opportunity with profit potential. We define alertness as consisting of three distinct elements: scanning and searching for information, connecting previously disparate information, and evaluating profitable business opportunities (Tang et al., 2012)	I am constantly scanning the environment for new opportunities

Each tool contributed in different ways. ChatGPT produced broad and semantically diverse item pools, Bard (Gemini) assisted with redundancy detection and semantic clustering, and Grammarly aided in linguistic refinement. Human researchers remained central throughout, removing vague or leading items, checking reverse-coded items, eliminating potential construct contamination (for example, drift toward entrepreneurial orientation), and verifying cultural neutrality. While redundancy reduction was partly automated through cross-model comparison and similarity clustering, final decisions rested with expert reviewers.

This hybrid AI–human workflow accelerated development while maintaining theoretical alignment. Just as importantly, it established a replicable process. Prompts, parameters, and decision criteria were documented so that future researchers can reproduce or adapt the workflow. In addition to the procedures applied in this study, the methodological framework also outlines several advanced techniques, such as semantic similarity clustering with vector embeddings (Russell-Lasalandra et al., 2024), persona-based simulated respondent testing, and adaptive logic, as prospective enhancements to AI-assisted item generation. Overall, the workflow illustrates how generative AI can be embedded into item generation in ways that increase efficiency without undermining conceptual integrity. It signals a methodological shift in which researchers move from sole authorship to critical curation and auditing of machine-generated outputs, ensuring that psychometric rigour remains grounded in human judgement.

Comparative Analysis

Table 5 contrasts AI-assisted and traditional item generation. Conventional methods typically involve prolonged cycles of drafting and expert review, producing item pools that are often narrow in scope and vulnerable to redundancy, conceptual drift, and cultural bias. By comparison, the AI-assisted approach broadened semantic diversity and embedded automated checks, while preserving a central role for human oversight.

Table 5.

Comparison of Traditional vs. AI-Assisted Item Generation

Dimension	Traditional item generation	AI-assisted item generation (this study)
Time required	Typically, 4–8 weeks, involving literature synthesis, expert workshops, iterative drafting and redrafting	2–3 days to generate, refine, and review a comprehensive pool of items
Labor intensity	High: requires multiple experts, qualitative coding, and repeated meetings	Moderate: AI automates initial item pool creation; human experts focus on oversight and theoretical alignment
Cost	Substantial: researcher time, transcription/analysis of qualitative data, and extended workshops	Lower: limited AI usage costs; main expense is expert review time
Conceptual coverage	Constrained by human cognitive limits and potential biases; often narrow and repetitive item pools	Broad: large, diverse item sets generated rapidly, spanning multiple semantic formulations
Redundancy management	Manual review, time-consuming, risk of overlooking near-duplicates	Automated support (cross-model checks, bard redundancy detection; future use of VectorStores for semantic clustering)
Bias control	Dependent on expert awareness, unconscious biases may persist in item pools	Partial automation: outputs can be audited for gendered/cultural phrasing, but still require human intervention to ensure inclusivity
Flexibility under time pressure	Limited: timelines constrain scope of item pool and refinement	High: enables rapid development while maintaining theoretical fidelity
Transparency & replicability	Processes are often under-reported; difficult to replicate exactly	Prompts, parameters, and workflow documented (Appendix A), supporting replication and adaptation

The chief advantage of AI-assisted item generation lies in efficiency and breadth. Generative models such as ChatGPT or Gemini can produce hundreds of candidate items in a fraction of the time required for traditional approaches, while redundancy detection and semantic clustering help to minimise conceptual blind spots. However, these advantages come with caveats. Commercial LLMs function with limited transparency, leaving researchers little insight into their internal processes (Dwivedi et al., 2023; Zhuo et al., 2023) whose inner workings are inaccessible to researchers. They also reflect cultural and linguistic asymmetries present in the corpora on which they are trained (Bender et al., 2021; Ferrara, 2023; Weidinger et al., 2021), which can compromise inclusivity and construct validity if not actively mitigated.

Equity presents another challenge. Uneven access to premium AI platforms across institutions and geographies may widen methodological divides between well-resourced and less-resourced scholars. In the absence of deliberate safeguards, such disparities risk reinforcing structural inequities in opportunities to innovate within psychometric practice.

For these reasons, AI should be understood not as a replacement for traditional methods but as a complement to them. Where traditional approaches emphasise rigour, transparency, and theoretical fidelity, AI contributes speed, scalability, and semantic variation. The methodological opportunity lies in integration. Developing hybrid workflows that retains the strengths of both while mitigating their respective weaknesses.

AI Prompts and Parameters

Prompt design is central to both transparency and replicability. Poorly structured prompts risk conceptual drift, whereas well-designed prompts anchor item generation in theoretical definitions (Beghetto et al., 2025; Gao et al., 2023; Russell-Lasalandra et al., 2024).In this study, prompts were anchored in established definitions, constrained for psychometric suitability (DeVellis & Thorpe, 2016; Hinkin, 1998), and designed to balance semantic breadth with conceptual precision. Different tools were employed in complementary ways, with ChatGPT providing creative breadth, Bard assisting with redundancy detection, and Grammarly refining language. Cross-model triangulation further enhanced robustness by reducing the likelihood of overfitting to any single model’s idiosyncrasies (Chen et al., 2021; Dwivedi et al., 2023).

The deliberate documentation of prompt structures, parameter settings, and workflow decisions was not ancillary but central to methodological rigour. Full prompt sets and configurations are provided in Appendix A, enabling replication and critical evaluation. In doing so, the study offers a template for transparent reporting in AI-assisted psychometrics.

Ethical and Practical Considerations

Generative AI introduces methodological opportunities but also a new layer of risk. Transparency is a longstanding weakness in psychometric item generation, which has often been described as a “black box” process (Morgado et al., 2018). Researchers seldom disclose how candidate items were produced, or which ones were discarded and why. AI threatens to compound this opacity unless prompts, model parameters, and workflow decisions are disclosed in sufficient detail (Görgülü et al., 2025; Russell-Lasalandra et al., 2024).To counter this, our framework embeds replication readiness by publishing the full set of prompts and configurations, making explicit processes that have traditionally remained opaque.

Bias represents another systemic challenge. LLMs inherit cultural, gendered, and linguistic asymmetries from their training corpora (Bender et al., 2021; Ferrara, 2023). In our outputs, we observed gendered phrasing and culturally narrow examples, patterns that were mitigated through cross-model triangulation and human auditing but not fully eliminated (Grassini, 2023; Macdonald et al., 2023). Long-term mitigation will require domain-specific architectures, including fine-tuned models or retrieval-augmented generation, trained on corpora that are diverse, representative, and context-sensitive (Lewis et al., 2021; Russell-Lasalandra et al., 2024).

Privacy and deployment risks further complicate adoption. Commercial LLMs raise unresolved concerns about data ownership, security, and model retraining (Dwivedi et al., 2023; Zhuo et al., 2023). These risks are especially pertinent in market research, where psychometric instruments routinely capture sensitive attitudinal and behavioural data. Addressing them requires privacy-by-design principles (Dasigi et al., 2021), and movement toward locally hosted or RAG-enhanced models that minimise dependence on proprietary platforms (Hoffmann et al., 2024).

Together, these concerns underline a key point. AI-assisted psychometrics is not only a technical innovation but also an ethical practice. Without robust safeguards, efficiency gains may be offset by hidden risks to validity, inclusivity, and trust, reinforcing the need for structured human oversight of LLM use in research (Behrend & Landers, 2025).

Case Study: Entrepreneurial Mindset Scale – Proof of Concept

The AI-assisted workflow (Table 3) was applied to the entrepreneurial mindset construct as a proof of concept. The aim was not to provide an exhaustive procedural account but to illustrate what changes when generative AI is embedded into the most cognitively demanding stage of psychometric scale development. Specifically, the case illustrates the accelerated production of a pilot-ready scale, alongside the design of test samples and initial psychometric evidence. The methodological framework outlines advanced options such as semantic similarity clustering, simulated respondents, and adaptive logic as potential extensions to the workflow. These were not applied in this proof-of-concept study, which focused solely on item generation and refinement.

Application of the workflow generated a 33-item pool mapped onto six cognitive dimensions of entrepreneurial mindset. What would normally require weeks of expert consultation and iterative drafting was achieved in fewer than three days. This acceleration proved decisive. It enabled inclusion of the scale in the 2023–24 Global University Entrepreneurial Spirit Students’ Survey (GUESSS). Without AI, the development timeline would almost certainly have precluded participation. The case, therefore, demonstrates how AI expands the time horizons of psychometric development, enabling bespoke scale design under severe time and resource constraints.

Acceleration did not eliminate the need for oversight. Generative AI produced diverse candidate items, but human intervention remained indispensable. For example, a generic item such as “I can adapt to changes in my environment” was sharpened into “I can quickly adjust my strategies when encountering unexpected challenges.” Items drifting toward entrepreneurial orientation, a related but distinct construct, were excluded, and reviewers flagged ambiguous or culturally biased phrasing. This illustrates the central tension of AI-assisted methods: the very tools that expand item breadth simultaneously increase the risk of construct drift and bias, thereby amplifying the importance of researcher judgement.

Feasibility testing of the AI-generated items was conducted with two university student samples, reflecting the scale’s intended application in entrepreneurship education contexts. The first dataset came from the 2023–24 Global University Entrepreneurial Spirit Students’ Survey (GUESSS) at the University of Auckland (n = 1,319). The second dataset was collected in the United Kingdom via Prolific Academic’s student panel (n = 510). Prolific’s pre-screening ensured that only currently enrolled higher-education students were recruited. This dual-sample design strengthens the initial feasibility test and demonstrates the applicability of AI-generated items within entrepreneurship education contexts.

Preliminary psychometric checks were promising. Internal consistency exceeded conventional thresholds for most dimensions (Nunnally & Bernstein, 1994), although heuristics fell slightly below. Exploratory factor analysis suggested a two-factor higher-order structure, with entrepreneurial alertness loading strongly on one factor. Reverse-coded items proved unstable and were removed, consistent with prior cautionary notes on their reliability (DeVellis & Thorpe, 2016). These findings are presented as evidence of feasibility, with full validation to follow in subsequent work.

The case, therefore, illustrates both the promise and limits of automation. AI reduced development time, broadened semantic coverage, and enabled participation in an international survey under compressed timelines. Human oversight was vital in refining items, clarifying ambiguities, and preserving theoretical integrity. AI altered, rather than displaced, researcher involvement. It redirected effort from drafting to the critical tasks of evaluation, curation, and ethical governance.

Rather than substituting for human expertise, it reconciles speed with rigour by redistributing expertise to the points where it is most valuable. The lessons extend well beyond entrepreneurship. Constructs such as brand loyalty, consumer trust, and perceived service quality also require measures that are both theoretically robust and adaptable to fast-changing contexts. In this process, generative AI delivers acceleration and semantic breadth, while human researchers safeguard conceptual clarity, cultural sensitivity, and ethical integrity. To support replication, the study provides full prompt sets, model parameters, and decision criteria in Appendix A.

Implications for Psychometric Scale Development in Market Research

This study carries significant implications for market research, highlighting new ways to enhance the efficiency, adaptability, and transparency of psychometric scale development (Russell-Lasalandra et al., 2024). Generative AI enables faster development while supporting precision in construct operationalisation. Constructs central to marketing practice, like brand loyalty, customer engagement, trust, and perceived service quality, demand instruments that are theoretically robust yet agile enough to adapt to fast-changing contexts.

One implication lies in rapid contextual customisation. Traditional development cycles require lengthy iterations between literature reviews, expert panels, and focus groups, delaying actionable insights. By contrast, generative AI can analyse real-time consumer data, linguistic usage, and cultural nuance to propose items attuned to specific cohorts or industries (Hoffmann et al., 2024). A loyalty scale could be adapted for younger versus older consumers, or a trust measure fine-tuned for financial services versus social media, without restarting the entire development process. This yields instruments that are contextually specific while remaining timely, thereby enhancing their managerial relevance (Gao et al., 2023).

A second implication concerns scalability across markets and populations. Global studies often require substantial manual effort to adapt instruments while retaining comparability. Generative AI can produce parallel item pools tailored to local contexts yet linked to a common measurement framework (Russell-Lasalandra et al., 2024). For multinational firms tracking brand engagement, this allows instruments to be simultaneously localised and standardised. Such scalability accelerates timelines and enables more granular segmentation, strengthening responsiveness to diverse consumer environments.

A third implication is the capacity to move beyond static measurement. AI tools can generate items responsive to emerging discourse, such as consumer reactions during a viral campaign (Rossi et al., 2024). Adaptive questionnaires, which dynamically adjust items based on prior responses, reduce redundancy and fatigue while improving psychometric quality (Grassini, 2023). This signals a shift toward more adaptive, respondent-centred measurement, fulfilling a long-standing aspiration in market research for instruments that evolve alongside consumer behaviour.

The study also underscores the value of hybrid workflows. In practice, different tools proved complementary, with ChatGPT generating semantically rich items, Bard resolving redundancies, and Grammarly refining linguistic clarity. Coupled with expert judgement, this ensemble yielded a more diverse and precise item pool. The implication for practitioners is clear. AI does not replace expertise but redistributes it. Researchers become curators and auditors of item pools rather than sole originators, shifting attention toward construct integrity, theoretical alignment, and ethical safeguards (Dwivedi et al., 2023).

Looking ahead, three domains require particular attention.

1. Cross-cultural and multilingual adaptation. Generative AI can accelerate translation, detect idiomatic mismatches, and flag culturally loaded phrasing, lowering the cost of international studies. Systematic testing of equivalence remains essential to guard against false comparability.

2. Respondent experience. While AI can streamline surveys, little is known about how respondents perceive AI-generated items. Do they appear clearer and more engaging, or less trustworthy because of their perceived artificiality? Comparative studies of data quality and engagement will determine whether efficiency gains translate into authentic insight.

3. Safeguards against automation complacency. The speed of AI risks premature adoption of unvalidated scales in commercial contexts. Protocols must therefore be institutionalised. These include expert panels for construct definition, mandatory reliability and validity testing, and training for critical interrogation of AI outputs. Without such safeguards, efficiency could erode conceptual clarity and ethical integrity.

In sum, the implications extend well beyond the entrepreneurial mindset. Generative AI can accelerate conceptualisation, broaden item diversity, enable adaptive and cross-cultural measurement, and provide scales that are both timely and psychometrically rigorous. However, these benefits are realised only if researchers resist over-reliance, attend to inclusivity and respondent trust, and maintain institutional safeguards. Under such conditions, AI functions not only as a tool for efficiency but as a catalyst for re-examining psychometric practice, recasting scale development as a more agile, transparent, and participant-centred process at the heart of market research. Recent work has outlined general frameworks for how large language models can be incorporated into organisational research (Behrend & Landers, 2025). Building on this foundation, the present study demonstrates how those principles can be operationalised within market research through a replicable, construct-focused workflow for psychometric scale development.

Limitations

Several limitations of this study should be acknowledged. First, it does not present a fully validated scale. The entrepreneurial mindset instrument is offered as a proof of concept, with full validation, including confirmatory factor analysis, reliability testing, and assessments of convergent and discriminant validity, to be reported separately. Accordingly, this study is positioned as a methodological contribution on AI-assisted item generation rather than as the presentation of a finalised instrument.

Second, the study relies on commercial large language models (LLMs). While these platforms offer accessibility and replicability, they are constrained by opaque “black-box” architectures and by the cultural and linguistic biases embedded in their training data (Dwivedi et al., 2023; Hoffmann et al., 2024). Their use illustrates both the potential and the tension of AI in scale development. Models accelerate item generation but at the cost of transparency and interpretability. Domain-specific alternatives such as fine-tuned open-source models or retrieval-augmented generation (RAG) architectures may offer greater control, bias mitigation, and theoretical traceability (Grassini, 2023; Russell-Lasalandra et al., 2024). Comparative evaluations of these architectures remain an important next step.

Third, the case study is narrow in scope. Entrepreneurial mindset is a useful illustrative construct but remains peripheral to the central concerns of market research, and the initial testing relied on Western-educated student samples. As a result, the generalisability of AI-generated items across constructs, industries, and cultural settings is unproven. Establishing robustness through cross-cultural, multilingual, and industry-specific validation is therefore a critical research priority (Beghetto et al., 2025; Görgülü et al., 2025). This limitation highlights the need for work that positions AI not only as a technical aid but also as a tool for inclusive and globally relevant measurement.

Finally, this study concentrated on the technical workflow of item generation rather than respondent experience or longer-term researcher practices. Risks such as automation complacency where researchers over-trust AI outputs require careful management through explicit validation protocols, ethical safeguards, and training in human–AI collaboration (Dwivedi et al., 2023; Hoffmann et al., 2024). Equally, little is known about how respondents perceive AI-generated items. Whether they view them as clearer, more engaging, or less trustworthy than human-authored items, remains an open question with direct implications for data quality and instrument legitimacy.

Taken together, these limitations suggest three priority directions for future research: (1) advancing beyond commercial LLMs to domain-specific and transparent architectures, (2) extending validation across cultures, languages, and industries to establish generalisability, and (3) examining the behavioural dynamics of human–AI collaboration from both researcher and respondent perspectives. Addressing these gaps will be essential for establishing AI as a rigorous, transparent, and globally relevant tool for psychometric scale development.

Conclusions

This paper advances methodological practice in market research by demonstrating how generative AI can be systematically integrated into psychometric scale development. Focusing on the pivotal stage of item generation, we outlined a replicable framework that translates theoretical constructs into candidate items through prompt engineering, accelerates production without compromising conceptual rigour, and embeds safeguards through structured human oversight (Russell-Lasalandra et al., 2024). The entrepreneurial mindset case served as a proof of concept, showing how constructs that typically require weeks of expert iteration can be operationalised within days and incorporated into international surveys under real-world constraints.

The central contribution lies in reframing item generation as a methodological contradiction where acceleration and opacity coexist. Generative AI compresses timelines dramatically (Beghetto et al., 2025), but its “black-box” nature threatens transparency and introduces bias (Dwivedi et al., 2023; Hoffmann et al., 2024). The hybrid human–AI process demonstrated here shows that this tension can be managed productively. Automation broadens the semantic and conceptual scope of item pools, while expert oversight safeguards validity, alignment, and ethical integrity. Efficiency is therefore best understood not as an end in itself but as a redistribution of researcher effort from routine drafting to higher-order judgement and curation (Görgülü et al., 2025).

The study also underscores the need to extend inquiry beyond a single case. Entrepreneurial mindset is peripheral to the core concerns of market research, and reliance on Western student samples limits generalisability. Constructs central to marketing practice, such as brand loyalty, consumer trust, engagement, and perceived service quality, provide vital testbeds for assessing whether AI-assisted methods can deliver scalable, context-sensitive, and globally relevant instruments. Systematic replication across industries, populations, and cultural contexts will be essential if AI is to be consolidated as a robust methodological resource.

More broadly, the findings point toward a re-examination of psychometric practice in an AI era. Item generation has long been a labour-intensive yet opaque preliminary step (Morgado et al., 2018). Generative AI transforms it into an auditable and replicable process in which prompts, parameters, and decision rules can be documented in ways rarely possible in human-only workflows. This shift signals a future in which psychometrics evolves from descriptive tool-making to an adaptive, dynamic, and accountable science of measurement.

In conclusion, generative AI should not be understood as a substitute for established psychometric practice but as a catalyst for its renewal. By formalising a transparent, replicable, and human-centred approach to item generation, this study provides a methodological foundation upon which future scholarship can build (Rossi et al., 2024). As AI reshapes research practices, psychometrics and market research can benefit from approaches that couple computational power with scholarly oversight, contributing to higher standards of methodological rigour, inclusivity, and innovation.

Footnotes

ORCID iDs

Darsel Keane

Rod B. McNaughton

Ethical Approval

The research stage described in the paper did not require ethics approval.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The manuscript does not report the results of data analysis.

Practical Guide for AI-Assisted Item Generation

This appendix provides a detailed account of the workflow we used for AI-assisted item generation. While the main text outlines the conceptual rationale, here we document the full stepwise process, including prompts, model parameters, and decision points requiring human oversight. The objective is to maximise transparency and replicability, enabling other researchers to adapt this workflow to their own constructs. The workflow described below includes both the steps applied in our proof-of-concept study and additional exploratory options (e.g., persona-based simulated respondents, adaptive logic). These latter steps are included to illustrate potential enhancements for future applications of AI-assisted item generation but were not implemented in the entrepreneurial mindset case study.

References

Beghetto

R. A.

Ross

Karwowski

Glăveanu

V. P.

(2025). Partnering with AI for instrument development: Possibilities and pitfalls. New Ideas in Psychology, 76, p.101121, Article 101121. https://doi.org/10.1016/j.newideapsych.2024.101121

Behrend

T. S.

Landers

R. N.

(2025). Participant interactions with artificial intelligence: Using large language models to generate research materials for surveys and experiments. Journal of Business and Psychology. https://doi.org/10.1007/s10869-025-10035-6

Beltagy

Cohan

(2019). SciBERT: A pretrained language model for scientific text. (arXiv:1903.10676). arXiv. https://doi.org/10.48550/arXiv.1903.10676

Bender

E. M.

Gebru

McMillan-Major

Shmitchell

(2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21), accountability, and transparency Virtual Event Canada, March 3 - 10, 2021 (pp. 610–623). Association for Computing Machinery. https://doi.org/10.1145/3442188.3445922

Brown

A. L.

(1987). Metacognition, executive control, self-regulation, and other more mysterious mechanisms. In Weinert

F. E.

Kluwe

R. H.

(Eds.), Metacognition, motivation, and understanding (pp. 65–116). Hillsdale, NJ: Lawrence Erlbaum Associates.

Buchanan

Kern

M. L.

(2017). The benefit mindset: The psychology of contribution and everyday leadership. International Journal of Wellbeing, 7(1), 1–11. https://doi.org/10.5502/ijw.v7i1.538

Cachola

Cohan

Weld

D. S.

(2020). TLDR: Extreme summarization of scientific documents. (arXiv:2004.15011). arXiv. https://doi.org/10.48550/arXiv.2004.15011

Chaudhuri

Holbrook

M. B.

(2001). The chain of effects from brand trust and brand affect to brand performance: The role of brand loyalty. Journal of Marketing, 65(2), 81–93. https://doi.org/10.1509/jmkg.65.2.81.18255

Chen

Tworek

Jun

Yuan

Pinto

H. P. de O.

Kaplan

Edwards

Burda

Joseph

Brockman

Ray

Puri

Krueger

Petrov

Khlaaf

Sastry

Mishkin

Chan

Gray

Zaremba

(2021). Evaluating large language models trained on code. arXiv:2107.03374. arXiv. https://doi.org/10.48550/arXiv.2107.03374

10.

Churchill

G. A.

Jr (1979). A paradigm for developing better measures of marketing constructs. Journal of Marketing Research, 16(1), 64–73. https://doi.org/10.2307/3150876

11.

Clark

L. A.

Watson

(1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309–319. https://doi.org/10.1037/1040-3590.7.3.309

12.

Clickup . (2024). How to use AI for market research (use cases & tools). Clickup. https://clickup.com/blog/how-to-use-ai-for-market-research/

13.

Crum

A. J.

Salovey

Achor

(2013). Rethinking stress: The role of mindsets in determining the stress response. Journal of Personality and Social Psychology, 104(4), 716–733. https://doi.org/10.1037/a0031201

14.

Dasigi

Beltagy

Cohan

Smith

N. A.

Gardner

(2021). A dataset of information-seeking questions and answers anchored in research papers. In Toutanova

Rumshisky

Zettlemoyer

Hakkani-Tur

Beltagy

Bethard

Cotterell

Chakraborty

Zhou

(Eds.), Proceedings of the 2021 conference of the north American chapter of the association for computational linguistics: Human language technologies, (NAACL-HLT 2021). Online June 2021. (pp. 4599–4610). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.365

15.

Davis

M. H.

Hall

J. A.

Mayer

P. S.

(2016). Developing a new measure of entrepreneurial mindset: Reliability, validity, and implications for practitioners. Consulting Psychology Journal: Practice and Research, 68(1), 21–48. https://doi.org/10.1037/cpb0000045

16.

Dawis

R. V.

(1987). Scale construction. Journal of Counseling Psychology, 34(4), 481–489. https://doi.org/10.1037/10109-023

17.

Denyer

Tranfield

(2009). Producing a systematic review. In The sage handbook of organizational research methods (pp. 671–689). Sage Publications Ltd.

18.

DeVellis

R. F.

Thorpe

C. T.

(2016). Scale development: Theory and applications (4th ed.). Sage Publications. https://tms.iau.ir/file/download/page/1635238305-develis-2017.pdf

19.

Dwivedi

Y. K.

Kshetri

Hughes

Slade

E. L.

Jeyaraj

Kar

A. K.

Baabdullah

A. M.

Koohang

Raghavan

Ahuja

Albanna

Albashrawi

M. A.

Al-Busaidi

A. S.

Balakrishnan

Barlette

Basu

Bose

Brooks

Buhalis

Wright

(2023). Opinion paper: “So what if ChatGPT wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management, 7, 71, 102642, Article 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642

20.

Ferrara

(2023). Should ChatGPT be biased? Challenges and risks of bias in large language models. First Monday, 28(11), 1. https://doi.org/10.5210/fm.v28i11.13346

21.

FFE-Ye

(2012). Impact of entrepreneurship education in Denmark—2011. The Danish Foundation for Entrepreneurship – Young Enterprise. https://eng.ffe-ye.dk/media/202248/impact_of_entrepreneurship_education_in_denmark_2011.pdf

22.

Flavell

J. H.

(1979). Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American Psychologist, 34(10), 906–911.

23.

Flavell

J. H.

(1987). Speculations about the nature and development of metacognition. In Weinert

F. E.

Kluwe

R. H.

(Eds.), Metacognition, motivation, and understanding (pp. 21–29). Hillsdale, NJ: Lawrence Erlbaum Associates.

24.

French II

R. P.

(2016). The fuzziness of mindsets: Divergent conceptualizations and characterizations of mindset theory and praxis. International Journal of Organizational Analysis, 24(4), 673–691. https://doi.org/10.1108/IJOA-09-2014-0797

25.

Gao

Yen

Chen

(2023). Enabling large language models to generate text with citations. (arXiv:2305.14627). arXiv. https://doi.org/10.48550/arXiv.2305.14627

26.

Gerbing

D. W.

Anderson

J. C.

(1988). An updated paradigm for scale development incorporating unidimensionality and its assessment. Journal of Marketing Research, 25(2), 186–192. https://doi.org/10.2307/3172650

27.

Gibson

(1941). A critical review of the concept of set in contemporary experimental psychology. Psychological Bulletin, 38(9), 781–817. https://doi.org/10.1037/h0055307

28.

Gollwitzer

P. M.

(1990). Chapter 2 action phases and mind-sets. In Higgins

E. T.

Sorrentino

R. M.

(Eds.), Handbook of motivation and cognition. Foundations of Social Behavior (2, pp. 53–92). The Guilford Press. https://www.socmot.uni-konstanz.de/sites/default/files/90_Gollwitzer_Action_Phases_MindSets.pdf

29.

Gollwitzer

P. M.

(2012). Mindset theory of action phases. In Van Lange

Kruglanski

Higgins

E. T.

(Eds.), Handbook of Theories of Social Psychology (1, pp. 526–546). Sage Publications Ltd. https://doi.org/10.4135/9781446249215.n26

30.

Görgülü

Coşkun

Demi̇r

Si̇pahi̇oğlu

(2025). A psychometric analysis of the artificial intelligence skills scale developed through chat GPT. Education and Information Technologies, 30(9), 12489–12516. https://doi.org/10.1007/s10639-024-13294-7

31.

Grassini

(2023). Development and validation of the AI attitude scale (AIAS-4): A brief measure of general attitude toward artificial intelligence. Frontiers in Psychology, 14, 1191628. https://doi.org/10.3389/fpsyg.2023.1191628

32.

Gupta

A. K.

Govindarajan

(2002). Cultivating a global mindset. Academy of Management Perspectives, 16(1), 116–126. https://doi.org/10.5465/ame.2002.6640211

33.

Haynie

J. M.

Shepherd

D. A.

(2009). A measure of adaptive cognition for entrepreneurship research. Entrepreneurship Theory and Practice, 33(3), 695–714. https://doi.org/10.1111/j.1540-6520.2009.00322.x

34.

Hinkin

T. R.

(1998). A brief tutorial on the development of measures for use in survey questionnaires. Organizational Research Methods, 1(1), 104–121. https://doi.org/10.1177/109442819800100106

35.

Hoffmann

Lasarov

Dwivedi

Y. K.

(2024). AI-empowered scale development: Testing the potential of ChatGPT. Technological Forecasting and Social Vol. 205, p.123488, Article 123488. https://doi.org/10.1016/j.techfore.2024.123488

36.

Hollebeek

L. D.

Glynn

M. S.

Brodie

R. J.

(2014). Consumer brand engagement in social media: Conceptualization, scale development and validation. Journal of Interactive Marketing, 28(2), 149–165. https://doi.org/10.1016/j.intmar.2013.12.002

37.

Hruby

Watkins-Mathys

Hanke

(2016). Antecedents and outcomes of a global mindset: A thematic analysis of research from 1994 to 2013 and future research agenda. In Advances in global leadership (9, pp. 213–280). Emerald Group Publishing Limited. https://doi.org/10.1108/S1535-120320160000009008

38.

infodesk . (2024). Market insights strategy: Traditional research vs. AI-Driven analysis. https://www.infodesk.com/blog/market-insights-strategy-traditional-research-vs.-ai-driven-analysis

39.

Kirzner

I. M.

(1979). Perception, opportunity, and profit: Studies in the theory of entrepreneurship. University of Chicago Press.

40.

Krueger

N. F.

(2007). What lies beneath? The experiential essence of entrepreneurial thinking. Entrepreneurship Theory and Practice, 31(1), 123–138. https://doi.org/10.1111/j.1540-6520.2007.00166.x

41.

Kuratko

D. F.

Fisher

Audretsch

D. B.

(2020). Unraveling the entrepreneurial mindset. Small Business Economics, 57(4), 1681–1691. https://doi.org/10.1007/s11187-020-00372-6

42.

Lewis

Perez

Piktus

Petroni

Karpukhin

Goyal

Küttler

Lewis

Yih

Rocktäschel

Riedel

Kiela

(2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. (arXiv:2005.11401). arXiv. https://doi.org/10.48550/arXiv.2005.11401

43.

Loevinger

(1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694. https://doi.org/10.2466/pr0.1957.3.3.635

44.

London

J. S.

Bekki

J. M.

Brunhaver

S. R.

Carberry

A. R.

McKenna

A. F.

(2018). A framework for entrepreneurial mindsets and behaviors in undergraduate engineering students: Operationalizing the kern family foundation’s “3Cs”. Advances in Engineering Education, 7(1), 1–12. ERIC Number: EJ1199586. Available from Advances in Engineering Education (AEE).

45.

Macdonald

Adeloye

Sheikh

Rudan

(2023). Can ChatGPT draft a research article? An example of population-level vaccine effectiveness analysis. Journal of Global Health, 13, p. 01003, Article 01003. https://doi.org/10.7189/jogh.13.01003

46.

Macpherson

Jones

(2010). Editorial: Strategies for the development of international journal of management reviews. International Journal of Management Reviews, 12(2), 107–113. https://doi.org/10.1111/j.1468-2370.2010.00282.x

47.

Mathisen

J. E.

Arnulf

J. K.

(2013). Competing mindsets in entrepreneurship: The cost of doubt. International Journal of Management in Education, 11(3), 132–141. https://doi.org/10.1016/j.ijme.2013.03.003

48.

Mathisen

J. E.

Arnulf

J. K.

(2014). Entrepreneurial mindsets: Theoretical foundations and empirical properties of a mindset scale. Academy of Management Proceedings, 1(5), 81–97. https://doi.org/10.5465/AMBPP.2012.13739abstract

49.

Morgado

F. F. R.

Meireles

J. F. F.

Neves

C. M.

Amaral

A. C. S.

Ferreira

M. E. C.

(2018). Scale development: Ten main limitations and recommendations to improve future research practices. Psicologia: Reflexão e Crítica: revista semestral do Departamento de Psicologia da UFRGS, 30(1), 3. https://doi.org/10.1186/s41155-016-0057-1

50.

Morgan

R. M.

Hunt

S. D.

(1994). The commitment-trust theory of relationship marketing. Journal of Marketing, 58(3), 20–38. https://doi.org/10.2307/1252308

51.

Naumann

(2017). Entrepreneurial mindset: A synthetic literature review. Entrepreneurial Business and Economics Review, 5(3), 149–172. https://doi.org/10.15678/EBER.2017.050308

52.

Netemeyer

Bearden

Sharma

(2003). Scaling procedures: Issues and applications. Sage Publications, Inc. https://doi.org/10.4135/9781412985772

53.

NielsenIQ . (2024). The rise of synthetic respondents in market research. NielsenIQ. https://nielseniq.com/global/en/insights/education/2024/the-rise-of-synthetic-respondents/

54.

Nunnally

J. C.

Bernstein

I. H.

(1994). Psychometric theory (3rd ed.). McGraw-Hill.

55.

Parasuraman

Zeithaml

V. A.

Berry

L. L.

(1988). Communication and control processes in the delivery of service quality. Journal of Marketing, 52(2), 35–48. https://doi.org/10.2307/1251263

56.

Parmar

(2024). AI for market research: Use cases, benefits, and implementation. Prismetric. https://www.prismetric.com/ai-in-market-research/

57.

Robinson

P. B.

Gough

(2020). The right stuff: Defining and influencing the entrepreneurial mindset. Journal of Entrepreneurship Education, 23(2), 1–16.

58.

Rossi

Mukkamala

R. R.

Thatcher

J. B.

Dwivedi

Y. K.

(2024). Augmenting research methods with foundation models and generative AI. International Journal of Information Management, 77, p. 102749, Article 102749. https://doi.org/10.1016/j.ijinfomgt.2023.102749

59.

Russell-Lasalandra

L. L.

Christensen

A. P.

Golino

(2024). Generative psychometrics via AI-GENIE: Automatic item generation and validation via network-integrated evaluation. https://doi.org/10.31234/osf.io/fgbj4

60.

Shekhar

Huang-Saad

Libarkin

(2019). Developing a conceptual framework to understand student participation in entrepreneurship education programs. In: American society for engineering education annual conference & exposition, Columbus, Ohio, USA, 25–28 June 2017. Materials science & engineering collection; ProQuest central. https://ezproxy.auckland.ac.nz/login?url=https://search.proquest.com/docview/2314026074?accountid=8424

61.

SurveySensum . (2024). 11 best AI survey tools in 2025. https://www.surveysensum.com/blog/ai-survey-tools

62.

Tang

Kacmar

K. M.

Busenitz

(2012). Entrepreneurial alertness in the pursuit of new opportunities. Journal of Business Venturing, 27(1), 77–94. https://doi.org/10.1016/j.jbusvent.2010.07.001

63.

Tranfield

Denyer

Smart

(2003). Towards a methodology for developing evidence-informed management knowledge by means of systematic review. British Journal of Management, 14(3), 207–222. https://doi.org/10.1111/1467-8551.00375

64.

Weidinger

Mellor

Rauh

Griffin

Uesato

Huang

P.-S.

Cheng

Glaese

Balle

Kasirzadeh

Kenton

Brown

Hawkins

Stepleton

Biles

Birhane

Haas

Rimell

Hendricks

L. A.

Gabriel

(2021). Ethical and social risks of harm from language models. (arXiv:2112.04359). arXiv. https://doi.org/10.48550/arXiv.2112.04359

65.

Zahoor

Al‐Tabbaa

Khan

Wood

(2020). Collaboration and internationalization of SMEs: Insights and recommendations from a systematic review. International Journal of Management Reviews, 22(4), 427–456. https://doi.org/10.1111/ijmr.12238

66.

Zappe

S. E.

(2018). Avoiding construct confusion: An attribute-focused approach to assessing entrepreneurial mindset. Advances in Engineering Education, 7(1), 1–12. ERIC Number: EJ1199590. Retrieved from https://eric.ed.gov/?id=EJ1199590

67.

Zhuo

T. Y.

Huang

Chen

Xing

(2023). Red teaming ChatGPT via jailbreaking: Bias, robustness, reliability and toxicity. (arXiv:2301.12867). arXiv. https://doi.org/10.48550/arXiv.2301.12867

Using Generative AI to Enhance Psychometric Scale Development in Market Research

Abstract

Keywords

Introduction

Psychometric Scale Development and AI in Market Research

Transforming Scale Development with AI

Methodological Framework for AI-Assisted Item Generation

Conceptual Foundations

Step-by-Step Workflow for AI-Assisted Item Generation

Comparative Analysis

AI Prompts and Parameters

Ethical and Practical Considerations

Case Study: Entrepreneurial Mindset Scale – Proof of Concept

Implications for Psychometric Scale Development in Market Research

Limitations

Conclusions

Footnotes

ORCID iDs

Ethical Approval

Funding

Declaration of Conflicting Interests

Data Availability Statement

Practical Guide for AI-Assisted Item Generation

References