Sage Journals: Discover world-class research

Abstract

This article presents the design, construct validation, and reliability of a self-report instrument in Spanish that aims to characterize different types of strategies that students can use to learn computer programming. We provide a comprehensive overview of the identification of learning strategies in the existing literature, the design and development of preliminary questionnaire items, the refinement of item wording, and the examination of the internal structure and reliability of the final instrument. The construction of the items was based on the educational theory of Self-Regulated Learning. The final version of the questionnaire, called the Computer Programming Learning Strategies Questionnaire (CEAPC), was administered to 647 students enrolled in computer programming courses. The data collected from the participants were used to examine the construct validity and reliability of the questionnaire. The CEAPC consists of 13 subscales, each corresponding to a different type of learning strategy, and a total of 89 items. Statistical analyses of the data indicate that the CEAPC has adequate construct validity. In addition, the results of the internal consistency analysis indicate satisfactory reliability across the different subscales of the instrument. This study contributes to the field of educational research, particularly in the area of self-regulated learning in computer programming.

Keywords

learning strategies computer programming self-regulated learning self-report questionnaire

Introduction

The widespread use of information technologies has led to computer programming becoming prominent not only in the software industry, but also in academia. As a result, numerous resources and websites have emerged to facilitate programming education and are accessible to anyone interested in acquiring this knowledge (Loksa & Ko, 2016). In addition, computer programming has been incorporated into the curricula of many universities, particularly within engineering programs. Despite this, there is evidence of significant dropout rates in subjects related to computer programming, primarily due to students' initial exposure to the subject matter (Juarez-Ramirez et al., 2018; Ramalingam & Wiedenbeck, 1998).

Several studies have been conducted to examine the factors that contribute to dropout rates in these subjects. The findings suggest that students, teachers, and tutors perceive programming as challenging, which often leads to difficulties in achieving satisfactory results within a limited timeframe (Tek et al., 2018). Given these challenges, it is crucial to support students in their learning processes, which requires an understanding of their self-regulation strategies. In other words, it is essential to assess their motivation levels and the learning strategies they employ (McDougall et al., 2016). Self-regulation in the context of learning refers to a self-directed process in which learners control their cognition, motivation, and behavior to achieve learning goals (Zimmerman, 1989).

In particular, the learning of computer programming has been extensively studied under the approach of self-regulation in learning through the use of self-report instruments. These instruments are questionnaires or item inventories that inquire about the learning strategies used by a student (Winne & Perry, 2000). In this context, learning strategies are defined as mental operations and behaviors that the person performs to facilitate the learning process, and their use is intended to affect how the person selects, acquires, organizes, or integrates new knowledge (Weinstein & Mayer, 1986). One difficulty, however, is that these tools are broad in scope and do not focus on a specific subject or domain of knowledge, but rather consider general learning strategies that may be appropriate for students in any subject area. In this sense, due to the peculiarities of learning programming, it is convenient to have a specific instrument for this field; considering that at the time of writing this article, a validated questionnaire to characterize the learning strategies in computer programming was not available in the literature. Thus, this work aims to answer the following research question: How can the learning strategies of computer programming students be characterized using a self-report questionnaire? In this way, the purpose of this work is to document the design and construct validation of a self-report instrument in Spanish language, called CEAPC, which stands for which stands for Cuestionario sobre Estrategias de Aprendizaje de la Programación de Computadores in Spanish (Questionnaire on Learning Strategies in Computer Programming). It was developed in response to the need to characterize the strategies that can positively influence students’ learning processes and their academic performance in the learning of computer programming.

Related Works

Self-report instruments are commonly used to assess self-regulated learning (SRL) because they directly measure how students regulate their own learning. Unlike other measurement protocols such as teacher observations or performance assessments, self-report instruments provide direct information about students' self-regulation processes (Winne & Perry, 2000). Examples of alternative measurement methods include think-aloud protocols used by authors such as Loksa and Ko (2016), visual activity logs used by Cheng et al. (2019), and activity logs collected by learning management systems (LMS) analyzed by Cicchinelli et al. (2018).

Self-report instruments aim to characterize aspects such as motivation, metacognitive strategies, and cognitive strategies that students use when learning a subject. These instruments present statements for individuals to rate and indicate the extent to which these statements apply to them (González-Torres & Torrano, 2012). Well-known self-report instruments for assessing SRL include the Motivated Strategies for Learning Questionnaire (MSLQ) (Pintrich et al., 1993), the Learning and Study Strategies Inventory (LASSI) (Weinstein et al., 1988), MAI (Metacognitive Awareness Inventory) (Schraw & Dennison, 1994), SPQ (Study Process Questionnaire) (Biggs, 1987), ALSI (Approaches to Learning and Studying Inventory) (Entwistle & McCune, 2004), and others. However, these prominent instruments were not specifically designed for a particular domain of knowledge and often do not consider the use of information and communication technologies as learning resources. This is due to the fact that most of these instruments were developed in the 1980s and 1990s (Garcia et al., 2018).

These instruments are valuable for assessing SRL because of their ease of administration and scoring, their ability to provide a comprehensive view of students' SRL behaviors and strategies, and their potential to track students' SRL development over time (Schellings, 2011; Winne & Perry, 2000). As a result, their use has spread to various domains of knowledge. In the field of computer programming, studies have identified various factors that influence learning outcomes, including motivation, cognition, attitudes, and self-efficacy. Two types of research can be distinguished: those that use or adapt the validated instruments mentioned above, and those that propose new instruments specifically tailored to computer programming research.

In the first type of research, notable work has been conducted by Bergin et al. (2005) using the MSLQ questionnaire. Their findings revealed that strategies such as organizing and generating ideas were not frequently used by students. However, they found that motivational aspects played a crucial role in positively influencing academic performance. Similarly, Tsai (2019) focused on self-efficacy expectancy and emphasized its importance. According to Bandura (2006), self-efficacy refers to an individual’s confidence in performing a particular task, such as learning computer programming concepts. In addition, Castellanos, Restrepo-Calle, González, and Ramírez-Echeverry (2017) and Ramírez-Echeverry et al. (2018) used the MSLQ-Colombia instrument, an adapted version of the MSLQ in Spanish, to assess engineering students in computer programming courses (Ramírez-Echeverry et al., 2016). The first study concluded that source code was not only associated with academic performance, but also positively correlated with motivational characteristics related to self-regulation, such as learning beliefs, task value, and self-efficacy. However, in the second study, learning strategies, as measured by the MSLQ-Colombia, did not show a significant correlation with academic performance.

Regarding the second type of research, which involves proposing new instruments, it is worth mentioning studies that have introduced instruments to characterize aspects related to the motivation to learn computer programming. For example, Ramalingam and Wiedenbeck (1998) developed a scale to assess the motivation and persistence levels of students studying object-oriented programming using the C++ programming language. The goal was to understand how students' motivation and persistence are affected in challenging situations, including distractions and uninteresting learning scenarios. In addition, Cetin and Ozden (2015) designed an attitude questionnaire with three dimensions: affect, cognition, and behavior. This instrument aimed to highlight the importance of motivational aspects, especially self-efficacy, in the process of learning computer programming. Furthermore, Tsai et al. (2019) developed the Computer Programming Self-Efficacy Scale (CPSES) based on the computational thinking framework. The CPSES consisted of five subscales with a total of 16 items that assessed students' beliefs about their own abilities in logical thinking, algorithm development, debugging, control, and collaboration. In contrast, the present study focuses specifically on exploring the learning strategies that students use to regulate their own learning process, which distinguishes it from the aforementioned works that primarily examine motivation-related aspects.

Based on the aforementioned research, and considering both types of studies, it becomes clear that there is a need to develop instruments that specifically characterize learning strategies relevant to the domain of computer programming. To the best of our knowledge, the existing studies have either proposed instruments to assess motivation to learn or have used instruments with a broad scope that do not specifically target the unique challenges of learning computer programming. This limitation may explain the difficulty in establishing clear relationships between the use of learning strategies and the academic performance of students in computer programming courses. To address this gap, not only relevant literature in the field of self-regulated learning should be considered, but also work in the broader field of computational thinking can play a crucial role in devising learning strategies that capture the specific demands and intricacies of programming education.

Among the computational thinking practices proposed by Weintrop et al. (2016), computational problem-solving practices stand out. These practices include methods that have been shown to be effective in solving problems using machines and other computational tools. This set of practices comprises seven components, including: preparing problems for computational solutions, programming, selecting appropriate computational tools, assessing different solutions, developing modular computational solutions, creating computational abstractions, and troubleshooting and debugging. Notably, a recent study by Cruz Castro et al. (2021) examined computational problem-solving practices in a first-year undergraduate engineering course. The research highlights the importance of troubleshooting and debugging as a critical practice in introductory courses that facilitates the successful and timely development of other computational thinking skills.

Materials and Methods

Research Design

To achieve the research objective, a methodology based on the self-report instrument construction scheme proposed by Carretero-Dios and Pérez (2005) was used, which consists of four stages. These stages are as follows:

1. Stage I: Identification of subscales through a comprehensive literature review.

2. Stage II: Design and construction of items with the collaboration of subject matter experts through focus groups.

3. Stage III: Initial Exploration - Analysis of the internal structure of the preliminary version of the instrument through Exploratory Factor Analysis (EFA) and conducting semi-structured interviews with computer programming students.

4. Stage IV: Final Exploration - Analysis of the internal structure of the final version of the instrument using EFA and reliability analysis.

Figure 1 provides a detailed overview of the four stages of the methodology, along with the specific activities performed in each stage.

Figure 1.

Proposed methodology for the construction of the self-report instrument.

Stage I: Identification of Subscales

This stage includes the justification of the study and the conceptual delineation of the construct to be evaluated. A thorough literature review was conducted, focusing on key articles and studies related to self-regulation in computer programming learning. Through this review, we identified learning strategies that have been studied in educational contexts related to computer programming, as well as the impact of these strategies on learning outcomes. This process of identifying strategies in the literature served as the basis for proposing groupings to form the subscales of our instrument.

It is important to highlight that the learning strategy scales from the Motivated Strategies for Learning Questionnaire (MSLQ) and MSLQ-Colombia instruments, as well as the Zimmerman categories redefined by Garcia et al. (2018) specifically for computer programming, were used as a reference framework during this stage. The categories of self-regulated learning strategies for programming, as redefined by Garcia et al. (2018) and used in this paper, are shown in Table 1.

Table 1.

Categories of self-regulated learning strategies in programming, adapted from (Garcia et al., 2018) and used in this paper.

Category	Definition	Related Techniques
Self-assessment	Students engage in self-assessment to validate their programming exercises.	Using test cases on the program.
Organization and transformation	Students initiate the design planning process before engaging in programming coding tasks.	Writing pseudocode, using flowcharts.
Goal-setting and planning	Students set programming goals and allocate time to develop programming assignments or study for exams.	Planning time.
Seeking information	Students use online knowledge bases to enhance their understanding of programming topics.	Using platforms such as Stackoverflow and Google.
Environment structuring	Students arrange windows and graphical widgets within their digital environment to create a more conducive learning atmosphere.	Using tools such as IDEs (Integrated Development Environments) that allow the learner to adjust the digital environment.
Seeking social assistance	Students make efforts to seek answers to their questions by reaching out to peers, teachers, and others through social knowledge bases.	Using online forums to find answers.
Reviewing records	Students engage in the practice of re-reading notes, records, programming logs, exams, or learning materials to prepare for classes or additional exams.	Reviewing study materials such as books, blogs, online forums and other websites.

Based on the study and analysis of the aforementioned sources, we have proposed a set of learning strategies for computer programming, as outlined in Table 2. In total, we have identified 16 preliminary groupings of learning strategies, each accompanied by its respective definition. These groupings will be incorporated into the questionnaire as subscales, if feasible, through construct validation of the instrument.

Table 2.

Preliminary subscales for the self-report instrument for computer programming.

Subscale	Description	New subscale	Number of proposed items	Sources used to identify learning strategies
Acquisition of declarative knowledge	The strategies that students use to acquire theoretical knowledge of computer programming. This includes to acquire knowledge about programming concepts such as variables, functions, control structures, as well as the syntax of a programming language.	✓	6	McGill and Volet (1997) Escobar-Avila et al. (2019) Park et al. (2019)
Elaboration of ideas	This subscale focuses on students' use of strategies to integrate and make connections between information to enhance learning. Students draw on prior knowledge and information from multiple sources, such as readings or other coursework, to make meaningful connections and facilitate long-term retention. By elaborating on ideas, students can effectively integrate new information with their existing knowledge, leading to faster and more effective acquisition of new knowledge.		7	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Elaboration of ideas.Clement et al. (1986)
Metacognition - Study method adaptation	This subscale examines the extent to which students adapt their study methods to effectively learn different subjects. It is important to evaluate and adjust study methods based on the different demands of different subjects. By reviewing and adjusting study methods accordingly, students can optimize their learning outcomes.		5	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Metacognition.Loksa and Ko (2016)Prather et al. (2019)
Metacognition - Learning planning	This subscale focuses on whether students engage in planning by setting goals and analyzing learning tasks. In doing so, students activate relevant prior knowledge, which facilitates the organization and comprehension of different topics. Analyzing the individual topics within a subject allows students to efficiently structure their learning tasks and avoid unnecessary focus on fragmented or less important elements.		5	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Metacognition.Loksa and Ko (2016)Prather et al. (2019)
Metacognition - Monitoring of learning	This subscale examines whether students engage in self-monitoring of their attention while performing study tasks. Self-assessment through questioning allows students to identify unclear elements in new topics that require further attention or analysis.		13	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Metacognition.Loksa and Ko (2016)Prather et al. (2019)
Organizing ideas	This subscale focuses on whether students use strategies such as summarizing, diagramming, or creating schemas to organize information and thematic ideas. By actively organizing ideas, students select relevant information and create a coherent structure for the topic.		3	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Organization.Clement et al. (1986)
Practice/Training	Due to the theoretical-practical nature of computer programming, this subscale includes the strategies students use to achieve a higher level of expertise and understanding of computer programming. It differs from the Problem-Solving subscale in that it focuses on finding, solving, and reviewing problems or examples that challenge students and allow them to practice what they have learned.	✓	7	Eltegani and Butgereit (2015) Smrithi and Venkatapathy (2015) Caruso et al. (2011)
Resource - Effort	This subscale explores if students maintain effort and control their focus even when studying in distracting or uninteresting situations. The level of effort control serves as an indicator of the students' commitment to achieving their learning goals.		8	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Effort regulation.
Resource - Study environment	This subscale examines whether students actively seek out or arrange physical environments favorable to study.		2	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Study environment and time.
Resource - Digital tools	It groups the strategies currently used by students who are learning programming based on the use of technology and digital tools. It includes aspects such as watching video tutorials, using integrated development environments (IDEs), visiting online forums or blogs, among others.	✓	10	Garcia et al. (2018) Categories: Seeking information, Environment structuring, Review records.Zimmerman (1989)Escobar-Avila et al. (2019)
Resource - Time	This subscale investigates whether students effectively schedule, manage, and regulate their study time to learn the subject. Students set programming goals and allocate time for programming assignments or studying for tests.		9	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Study environment and time.Garcia et al. (2018) Category: Goal-setting and planning.Zimmerman (1989)
Problem solving - Analysis	The strategies students use to solve programming problems in each of the corresponding phases: analysis, design, coding, and testing. This particular subscale focuses on strategies used during problem analysis.	✓	3	Garcia et al. (2018) Category: Organization and transformation.Weintrop et al. (2016)Computational thinking practices: preparing problems for computational solutions, and creating computational abstractions.Zimmerman (1989)McGill and Volet (1997)
Problem solving - Design	The strategies students use to design the solution to a programming problem. For example, designing flowcharts for the solution.	✓	7	Garcia et al. (2018) Category: Organization and transformation.Weintrop et al. (2016)Computational thinking practice: assessing different solutions, developing modular computational solutions.Zimmerman (1989)Caruso et al. (2011)
Problem solving - Coding	The strategies students use to code or implement the source code of a solution to a problem using a programming language.	✓	8	Garcia et al. (2018) Category: Organization and transformation.Weintrop et al. (2016)Computational thinking practice: computer programming.Zimmerman (1989)
Problem solving - Testing	The strategies students use to test programs as they solve programming problems. For example, the use of test cases to validate the functionality of programs.	✓	7	Garcia et al. (2018) Category: Self-assessment.Weintrop et al. (2016)Computational thinking practice: assessing different solutions, and troubleshooting and debugging.Zimmerman (1989)Caruso et al. (2011)
Peer learning	Student-initiated efforts to seek or acquire answers to questions from peers, teachers, and knowledgeable individuals through social knowledge bases.		7	MSLQ (Pintrich et al., 1993)MSLQ-Colombia (Ramírez-Echeverry et al., 2016) Subscale: Peer learning.Garcia et al. (2018) Category: Seeking social assistance.Zimmerman (1989)Demir and Seferoglu (2020)

Table 2 presents the name assigned to each preliminary subscale in the first column, followed by its description in the second column. The third column indicates whether it is a new subscale specific to computer programming, indicated by a check mark. The absence of a checkmark indicates that the subscale is adapted from the self-regulated learning approach, which is shared with instruments such as the MSLQ, the MSLQ-Colombia, or the Garcia et al. (2018) definitions. The fourth column indicates the number of proposed items for the subscale (in total 107 items). Finally, the fifth column lists the sources used to identify the learning strategies corresponding to each subscale.

Stage II: Design and Construction of Items

This stage involved the construction and implementation of a qualitative evaluation of the items in the preliminary instrument. The wording of each item was based on describing a technique that students can use to apply the corresponding learning strategy identified in Stage I (as proposed in the subscales in Table 2). For example, for the “Problem Solving” subscale, the items were worded as follows: (1) I engage in solving problems from different references, such as books or web platforms, and (2) I review examples of solved programming problems (Caruso et al., 2011).

Focus Groups

Before applying the preliminary questionnaire, we made a qualitative evaluation of the proposed wording of these items. The evaluation was done using the focus group method. The focus groups included three computer programming professors and between 5 and 7 master’s and doctoral students. All had previous experience in designing data collection instruments for engineering education research, which indicated their familiarity with the subject matter and their ability to provide valuable insights.

The focus groups were conducted over a period of 2 months, specifically in March and April 2020. A total of six focus group sessions were conducted. Each session lasted approximately 2 hours, allowing ample time for in-depth discussion and exploration of the topic. The focus group sessions were conducted remotely using Google Meet, a video conferencing platform. This remote format allowed for convenient participation without the need for physical presence during the COVID-19 pandemic lockdown. In addition, the use of Google Meet facilitated the recording of all participant interventions for later analysis, with participant consent. The principal investigator of the study, who is also the first author of the paper, served as the moderator for all focus group sessions. As moderator, the principal investigator facilitated the discussions, maintained the flow of conversation, and ensured that all participants had the opportunity to contribute their thoughts and ideas.

The main outcomes of the focus groups were suggestions for improving the wording of the items and recommendations for creating new items. For example, for the “Practice/Training” subscale, the experts indicated that it was important to include techniques such as repeating exercises, suggesting one’s own exercises, and using competitive programming platforms such as Codeforces or HackerRank. Similarly, for the “Resources - Digital Tools” subscale, the experts suggested techniques such as using online programming platforms and reviewing solved programming problems that explain the solution step by step. In summary, the focus groups allowed the items to be improved in both content and form.

Stage III: Initial Exploration - Analysis of the Internal Structure of the Preliminary Version of the Instrument

At this stage, the construct validity of the preliminary instrument was evaluated through a statistical analysis of the data obtained from the first administration. The instrument was administered to students enrolled in Computer Programming (CP), Object-Oriented Programming (OOP), and Data Structures (DS) courses at the National University of Colombia - Bogotá campus during the first semester of 2020. A Google Forms survey was used, which included the informed consent, instructions for completion, an explanation of the Likert scale used, and the instrument items. A 5-point Likert scale was used for the instrument with the following values: 5 - I totally agree, 4 - I agree, 3 - I neither agree nor disagree, 2 - I disagree, and 1 - I totally disagree. This scale was chosen for its readability, which is expected to increase the response rate of the questionnaire (Dawes, 2008). The items were randomly ordered to ensure separation of items belonging to each subscale. This version of the instrument took approximately 45–60 minutes to complete.

Participants

The preliminary version of the instrument was completed by a total of 244 participants, distributed among the different courses as follows: 178 students (73%) in CP, 44 students (18%) in OOP, and 22 students (9%) in DS. It is important to note that the CP course is a prerequisite for taking OOP, and OOP is a prerequisite for taking DS. Consequently, students enrolled in DS, being further along in their studies, may have had more experience in computer programming compared to those enrolled in CP and OOP.

Internal Structure of the Preliminary Version of the Instrument

First, prior to conducting the Exploratory Factor Analysis (EFA), we performed Bartlett’s test of sphericity and calculated the Kaiser-Meyer-Olkin index (KMO) to determine whether the data collected in this application of the instrument were suitable for factoring. Bartlett’s sphericity test assesses whether the correlation matrix of the variables is significantly different from an identity matrix. In other words, it tests whether there are significant relationships between the variables. If the test results in a statistical significance of less than 0.05, it indicates that the null hypothesis can be rejected, suggesting that the variables are not independent and can be factored. The KMO index is a measure of sampling adequacy used to determine the suitability of data for factor analysis. It assesses the extent to which the observed variables share common variance and can be factored. The KMO index ranges from 0 to 1, with values closer to 1 indicating better suitability for factor analysis.

In this case, the quantitative data from the surveys were analyzed using the software IBM® SPSS® Statistics version 22. As can be seen in Table 3, the data were factorizable as Bartlett’s test obtained a statistical significance of less than 0.05 and the KMO index was 0.854, which is greater than the recommended threshold of 0.8. This indicates that the data collected for the instrument application were suitable for factor analysis.

Table 3.

First application - Results of Bartlett’s test of sphericity and calculation of the Kaiser-Meyer-Olkin index (KMO).

Index	Value
Bartlett’s test of sphericity Kaiser-Meyer-Olkin (KMO)	0.854
Approx. Chi-square	15105.115
gl	5671
Significance	0.000

The internal structure of the preliminary version of the instrument was determined by means of EFA using principal axis factoring as the factor extraction method and the Oblimin rotation method. In this case, EFA was used to determine the internal structure of the instrument. It helps to identify the underlying factors or dimensions that explain the patterns of correlation among the observed variables. By conducting EFA, it was possible to gain insight into how different items or variables are related and group them based on common factors. In addition, the Oblimin rotation method allows simplifying and interpreting the factor structure obtained through EFA. It helps to obtain a more interpretable solution by allowing the factors to be correlated with each other. In this case, the objective of using Oblimin rotation was to maximize the high factor loadings of the items and minimize the low ones. By doing so, it becomes easier to interpret the relationships between the variables and assign them to specific factors (Watson, 2017). In a factor analysis, the factor loading represents the correlation between each observed variable (item) and the underlying factor. It indicates the strength and direction of the relationship between the item and the factor. Higher factor loadings indicate a stronger relationship between the item and the factor, suggesting that the item is more representative of the underlying factor. On the other hand, lower factor loadings indicate a weaker relationship between the item and the factor, suggesting that the item may not be well represented by the factor. In essence, factor loadings help determine the contribution of each item to the factors extracted in the analysis. It should be noted that in this study, an item was considered to belong to a factor if the factor loading of the item was greater than or equal to 0.3, which is considered acceptable in the literature (Hair et al., 2006).

After conducting the EFA, several notable findings emerged. First, the number of factors obtained did not match the 16 expected subscales originally proposed in the preliminary design of the instrument. In addition, certain items of the instrument were found to be grouped into unexpected factors rather than the intended ones. This was evident from the similarity of factor loadings across multiple factors, which ranged around 0.3. As a result, it was decided to limit the algorithm and define 15 factors in advance for the EFA analysis. The goal was to maximize the number of items grouped within factors that were consistent with the theoretical basis of the intended learning strategies characterized by the questionnaire. In summary, the EFA yielded the following results:

• Four factors were consistent with the expected subscales: Resource - Study Environment and Time, Peer Learning, Practice and Training, and Resource - Effort. Each of these factors included the items intended for their respective subscales.

• The subscale Use of Digital Tools was no longer distinct because its items were grouped into different factors.

• Four unexpected factors emerged, representing new subscales for the instrument. These factors, namely Problem Solving, Consult Information, Consult Documentation, and Consult Study Resources, showed consistency when analyzed within the theoretical framework of self-regulated learning in the context of computer programming.

• The remaining six factors reflected partial groupings of items intended for single subscales or included items from two or more subscales.

The results revealed instances where factors grouped items from different strategies without a plausible theoretical explanation for such grouping. In addition, certain items continued to belong to multiple factors with substantial factorial loads in each factor. As a result, 39 of the original 107 items in the preliminary version of the instrument were identified as problematic and required adjustments in wording. In order to explore the possible reasons for these unexpected results and to improve the wording of the problematic items, the subsequent phase of this research involved a qualitative analysis through semi-structured interviews with students.

Semi-Structured Interviews

The qualitative analysis of the problematic factors and items of the preliminary version of the instrument was conducted through semi-structured interviews. These types of interviews, as well as other qualitative data collection methods, provide insight into a person’s worldview and thus the subtle details that may be difficult to understand using only quantitative methods (Nassar-McMillan et al., 2010). This approach allows for a flexible yet focused conversation between the interviewer and the interviewee. The choice of this method is consistent with the goal of gaining insight into students' interpretations of the content of the problematic items.

For this research, semi-structured interviews were conducted with 35 students who had previously completed the preliminary version of the self-report instrument and voluntarily agreed to participate in this phase of the study. These students were selected because they had first-hand experience with the instrument and could provide valuable perspectives on its content. Interviews were conducted between June and July 2020.

Each interview focused on discussing the interpretation of a subset of the problematic items. Each interview involved a conversation between the principal investigator and a computer programming student, guided by pre-established questions, some other questions that arose during the activity, and the items being evaluated. The number of items discussed in each interview varied between 4 and 6, depending on the length of the conversation between the interviewer and the interviewee. On average, interviews lasted between 45 and 60 minutes, allowing sufficient time for in-depth exploration of participants' perspectives.

To facilitate qualitative data collection, an interview guide was developed. The guide consisted of a set of core questions for the interviewees. In addition, supplementary questions were included before or during the interview to explore specific aspects related to each item. The following is an outline of the core questions for each item:

1. When you read the following sentence or item, what thoughts or interpretations come to mind?

2. Based on your previous answer, how would you improve this sentence to make it easier to understand?

3. Do you use this learning strategy when you study programming? If not, do you use a similar approach?

The semi-structured interviews were also conducted remotely using Google Meet, which allowed for easy recording of the interviews with the consent of the participants. This approach facilitated the analysis of the interviews at a later stage. The recorded interviews were analyzed using thematic analysis. Through systematic coding, meaningful units of data were identified and labeled as codes derived from the words and phrases used by the participants. Common themes and patterns emerged from the coded data, representing key ideas and experiences. The researchers analyzed and interpreted the data within each theme, exploring connections, identifying evidence, and uncovering deeper meanings and implications.

By conducting semi-structured interviews, we were able to delve into the students' perspectives and gain a deeper understanding of their interpretations of the items. These insights provided valuable evidence for adjusting and refining the wording of the items to ensure greater clarity and accuracy in the subsequent version of the instrument. For example, students pointed out instances where the same word or phrase in the items was interpreted differently by different people. A specific example mentioned was the term “algorithmic elements,” which was originally used to refer to programming concepts such as variables, control structures, and methods. However, students found this term unclear and suggested that it be replaced with the phrase “programming concepts” for clarity. In addition, students pointed out ambiguities in certain words, such as the term “schema”. According to their feedback, they understood “schema” to be either a graphical method for organizing information or a method for modeling the solution to a problem before coding, especially during the design phase. This feedback highlighted the need for clearer and more precise language in the items to avoid confusion and ensure consistent interpretation across participants.

Adjustments to the Wording of Items or Deletion of Items

Based on the results of the semi-structured interviews, necessary adjustments were made to several problematic items identified in the initial application of the self-report instrument. In addition, the decision was made to eliminate 14 items. This elimination was based on observations gathered during the semi-structured interviews, as well as criteria such as the similarity of certain items, the difficulty in understanding certain items, and the presence of items that were grouped into multiple subscales. As a result of this analysis, a revised version of the instrument was obtained, consisting of a total of 93 items.

Stage IV: Final Exploration - Analysis of the Internal Structure of the Final Version of the Instrument

In this stage, the adjusted version of the instrument was evaluated to obtain the final version. The evaluation focused on studying the dimensionality or internal structure of the instrument and estimating its reliability. To achieve this, the instrument was administered for the second time using Google Forms and consisted of the 93 items derived from the previous stage. The items were randomly ordered to minimize response bias.

Participants

In this application, 647 students completed the questionnaire, about twice as many as in the first application. A total of 427 students (66%) were enrolled in CP, 181 students (28%) in OOP, and 39 students (6%) in DS. A minimum of 5 participants per item was required according to (Carretero-Dios & Pérez 2005).

Internal Structure of the Second Version of the Instrument

In the second application, the initial analysis of the data involved conducting Bartlett’s test of sphericity and calculating the Kaiser-Meyer-Olkin (KMO) index to determine whether the data were factorizable. Table 4 shows that the data were indeed suitable for conducting an exploratory factor analysis, as indicated by the fact that the statistical significance of the sphericity test was less than 0.05 and the KMO index was 0.935 (greater than 0.8). It is worth noting that the KMO index improved compared to the first application, which had a value of 0.854.

Table 4.

Second application - Results of Bartlett’s test of sphericity and calculation of the Kaiser-Meyer-Olkin index (KMO).

Index	Value
Bartlett’s test of sphericity Kaiser-Meyer-Olkin (KMO)	0.935
Approx. Chi-square	25273.039
gl	3916
Significance	0.000

An exploratory factor analysis was conducted using the principal axis method and Oblimin rotation, similar to the first application. The results showed an improved factor structure of the self-report instrument when analyzed with 13 factors instead of the expected 15. Comparing these results with the EFA results of the preliminary version of the instrument, the following findings stand out:

• In general, the factor loadings of each item within their respective factors increased substantially, and the conceptual relationships among the items grouped within each factor also improved.

• Factors were identified that included all of the expected items for each subscale, such as Peer Learning, Resource - Effort, Organizing Ideas, and Consult Documentation.

• Two factors emerged independently, including items related to Resource - Study Environment and Resource - Time. It is noteworthy that in the first application, only one factor emerged that included items from both of these learning strategies.

• Items related to metacognition (Metacognition - Study Method Adaptation, Metacognition - Learning Planning, and Metacognition - Monitoring of Learning) were combined into a single factor named Metacognition in Learning Computer Programming. This factor contained 16 items with factor loadings ranging from 0.4 to 0.5.

• Most of the items related to problem-solving strategies were grouped into a single factor that included strategies for analyzing, designing, coding, and testing programs. Factor loadings for these problem-solving items ranged from 0.3 to 0.5.

• Some items from the original Problem Solving - Coding subscale generated another factor that grouped items with strategies specific to the coding process during learning. Factor loadings on this factor range from 0.4 to 0.6.

• In addition, some items from the Organization of Ideas, Problem Solving - Design, and Problem Solving - Coding subscales were combined into one factor. These items showed a relationship with strategies for organizing important topic ideas (concepts and definitions) and organizing data to solve programming problems through diagrams and comments in the source code.

• Although the first application resulted in the separation of all items from the Acquisition of Declarative Knowledge subscale, in this application all items were grouped into a single factor with loadings around 0.4.

• Two unexpected but conceptually meaningful factors emerged: Programming problem-based learning (items with factor loadings between 0.3 and 0.5) and Monitoring comprehension in programming problem solving (items with factor loadings between 0.4 and 0.7). Some of these items were originally proposed for the Metacognition - Monitoring of Learning subscale. Both factors showed acceptable factor loadings.

• Two factors were identified independently, including items from Organizing ideas (loadings between 0.3 and 0.6) and Resource - Effort (loadings between 0.5 and 0.7). These two subscales were initially based on the MSLQ-Colombia.

Detailed results of the final factor structure of the instrument are presented in Figure 2.

Figure 2.

Internal structure of the computer programming learning strategies questionnaire CEAPC.

Deletion of Items From the Second Version of the Instrument

The item elimination process followed three criteria: items with factor loading around 0.2 in the expected factor, items with factor loading close to 0.3 in different factors, and lack of conceptual relationship between items within the same factor. Four items were eliminated as a result of this process. It is important to note that after each elimination, the Exploratory Factor Analysis was recalculated, demonstrating that the original 13 factors remained intact and that the remaining items within each factor retained the expected conceptual relationship. The subscales from which items were eliminated were: Problem Solving – Analysis, Problem Solving - Coding, Practice/Training, and Resource - Study Environment and Time.

The final version of the instrument, consisting of 89 items, is presented below. The statistical analysis was performed with the data from the 647 participating students, without considering the four items that were excluded.

Final Version of the Instrument

Subscales of the Instrument With Conceptual Description

The Computer Programming Learning Strategies Questionnaire (CEAPC) was finalized with a total of 89 items divided into 13 subscales, as shown in Table 5. The instrument, along with its application manual, is publicly available through the following repository: https://github.com/UNPLaS/CEAPC. In addition, the final dataset used to validate the questionnaire has been published and is available at: https://github.com/UNPLaS/CEAPC/blob/main/CEAPC_data.xlsx.

Table 5.

Subscales and number of items of the Computer Programming Learning Strategies Questionnaire (CEAPC).

#	Subscale	# items
1	Study environment management in computer programming learning	2
2	Time management in computer programming learning	8
3	Declarative knowledge acquisition in computer programming	3
4	Problem-based learning in computer programming for epistemological competences^a	7
5	Coding in computer programming learning^a	8
6	Consult documentation in computer programming learning	3
7	Metacognition in computer programming learning	16
8	Organizing ideas in computer programming learning	8
9	Practice in computer programming learning	9
10	Effort regulation in computer programming learning	7
11	Problem solving in computer programming learning	9
12	Monitoring comprehension when solving computer programming problems^a	3
13	Peer learning in computer programming learning	6
	Total	89

^aNew subscale.

The names of certain subscales changed from those originally proposed. In assigning these names, we considered technical terms used in self-report instruments of self-regulated learning, such as the MSLQ and the MSLQ-Colombia. For example, instead of using the word “resource” with either “effort” or “time”, it was decided to use the term “regulation” in the former case, resulting in “effort regulation”, and the term “management” in the latter case, resulting in “time management.” Similarly, in the subscale “Resource - Study Environment” we used “Management” instead of the word “Resource”, thus obtaining “Study Environment Management”.

Here is the conceptual description of each subscale in the final version of the CEAPC:

Study environment management in computer programming learning: This subscale examines the strategies students use to find appropriate physical locations for studying programming topics and performing related activities.

Time management in computer programming learning: This subscale focuses on the strategies students use to set, manage, and regulate their study time for programming topics. It considers factors such as out-of-class activities, group work, and efficient planning and allocation of time during problem-solving phases to improve programming solutions.

Declarative knowledge acquisition in computer programming: This subscale examines the strategies students use to acquire the basic theoretical knowledge necessary to learn computer programming concepts. It includes the use of resources such as books, video tutorials, and specialized websites to understand key concepts such as variables, control structures, loops, and their practical application through examples.

Problem-based learning in computer programming for epistemological competences: This subscale focuses on the strategies students use to learn computer programming through a problem-based learning (PBL) approach. PBL includes elements such as the acquisition of declarative knowledge, the application of that knowledge, the problem scenario, evaluation of the process, and active student engagement. Students can use strategies such as participating in virtual courses to acquire knowledge; analyzing programming problems by making connections between the subject matter, and their prior knowledge and experiences; designing solutions by breaking problems into smaller components; validating solutions through test cases; and practicing programming skills through competitive programming platforms, books, and other resources that provide constrained programming problems with known solutions.

Coding in computer programming learning: This subscale examines the learning strategies students use when coding solutions to programming problems. Students use a variety of resources, including integrated development environments (IDEs) and online forums. IDEs provide tools and support for coding solutions that help students learn language syntax and good programming practices. Online forums serve as a platform for finding answers to specific questions that may arise during the coding process. As they code their solutions, students use program debugging mechanisms to identify and correct potential errors. In addition, students can draw on multiple sources of information and their prior knowledge of other programming languages to facilitate the learning process and coding of solutions. This integration of multiple resources and prior programming experience contributes to the student’s overall learning and enhances the solution coding process.

Consult documentation in computer programming learning: This subscale examines the strategies students use to clarify important aspects of the programming language they are learning or using to code solutions to programming problems. These strategies include consulting official documentation for the programming language being used or relevant libraries.

Metacognition in computer programming learning: This subscale focuses on the strategies students use to plan, monitor, and adjust their learning of computer programming. Metacognition encompasses various aspects, from understanding basic concepts to solving programming problems. Planning allows students to set learning goals, analyze tasks or topics, and activate relevant prior knowledge to understand new concepts or solve programming problems. Monitoring comprehension allows students to reflect on their level of understanding, identify concepts they are struggling with, and identify the knowledge they need to solve programming problems. Adaptation strategies help students adjust their learning methods when faced with challenges in understanding course topics or solving programming problems.

Organizing ideas in computer programming learning: This subscale examines strategies for organizing ideas in computer programming. It includes organizing key ideas related to subject matter, ideas that guide the design of problem solutions, and ideas for documenting source code. Students can organize key ideas of subject matter topics through summaries, diagrams, and other visual representations that facilitate connections between relevant ideas and promote deep learning. In addition, students can organize ideas that guide the design of problem solutions through flowcharts, diagrams, or pseudocode, allowing them to plan solutions and focus on algorithm design. Finally, students can organize ideas for documenting problem solutions through source code comments, providing explanations in their own words to clarify the purpose of the solution.

Practice in computer programming learning: This subscale focuses on the strategies students use to learn programming through the practice of computer programming. Students use two main strategies: reviewing examples of solved problems and reviewing alternative solutions to those problems. These activities are undertaken with the goal of recalling key aspects used in solved problems, thereby facilitating understanding of the programming concepts embedded in those solutions. By reviewing examples and alternative solutions, students gain insight into different approaches to problem solving, allowing them to learn different methods for approaching programming problems. In addition, engaging in these activities helps students develop expertise in solving new problems and improves their overall problem-solving skills.

Effort regulation in computer programming learning: This subscale focuses on the strategies students use to regulate their effort when learning computer programming, particularly in the face of distractions or uninteresting situations. The subscale also examines students' efforts to solve programming problems, even when they are difficult, and their ability to identify and correct errors in their proposed solutions.

Problem solving in computer programming learning: This subscale examines the learning strategies students use during the analysis, design, coding, and testing phases of solving computer programming problems. The purpose is to explore how students use strategies during these stages to effectively address programming challenges. Specifically, students use strategies to analyze the problem, identify relevant elements, and establish their relationships. In addition, students use strategies to identify the programming elements necessary to design and code the solution. Using these strategies, students determine the appropriate structures, algorithms, and syntax needed to translate the problem into a program. Finally, during the testing phase, students use strategies to evaluate the implemented solution. This process allows them to identify errors or bugs, which they then correct. Through this iterative testing and debugging process, students not only correct errors, but also gain valuable insight and learn from their mistakes.

Monitoring comprehension when solving computer programming problems: This subscale focuses on the strategies students use to monitor their understanding of programming problem solutions. It examines how students review the source code of their solutions to ensure understanding and effectiveness. Students use resources such as self-explanation techniques in which they articulate and explain the logic and functionality of their code. They also engage in monitoring the execution of the program, observing the behavior and results of their code to assess its correctness and efficiency.

Peer learning in computer programming learning: This subscale examines the extent to which students use strategies to seek assistance and engage in collaborative learning with their peers, teachers, or teaching assistants in the context of computer programming. It explores the use of peer interactions to clarify doubts, increase understanding, gain different perspectives, solve problems, and improve code quality through the expertise and support of others.

Internal Structure of the CEAPC

Figure 2 illustrates the internal structure of the final self-report instrument. The ovals represent the subscales, while the boxes represent the items within each subscale, identified by their corresponding numbers in the self-report instrument. For example, the subscale labeled “Consult Documentation” consists of items numbered 66, 79, and 86 on the instrument. The values displayed next to the arcs emanating from each subscale indicate the factor loadings obtained by the respective items on that particular factor. To illustrate, item 66 obtained a factor loading of 0.711, while items 79 and 86 obtained loadings of 0.806 and 0.637, respectively.

Reliability of the CEAPC

Reliability, as measured by the Cronbach’s alpha coefficient, assesses the internal consistency of an instrument and indicates the extent to which its items consistently measure the same underlying construct (Ursachi et al., 2015). Cronbach’s alpha is a statistical measure that quantifies the interrelationship or correlation among items within a subscale or questionnaire. It produces a value between 0 and 1, with higher values indicating greater internal consistency. In essence, Cronbach’s alpha helps determine the reliability of an instrument in measuring a particular construct.

Table 6 presents the Cronbach’s alpha coefficients for each subscale of the CEAPC, indicating its internal consistency. The table shows alpha (α) values ranging from 0.662 to 0.893, which are considered acceptable based on the existing literature (Ursachi et al., 2015).

Table 6.

Internal consistency indices for the CEAPC subscales.

#	Subscale	Cronbach’s α
1	Study environment management in CPL	0.875
2	Time management in CPL	0.856
3	Declarative knowledge acquisition in CP	0.662
4	Problem-based learning in CP for epistemological competences	0.690
5	Coding in CPL	0.777
6	Consult documentation in CPL	0.774
7	Metacognition in CPL	0.893
8	Organizing ideas in CPL	0.770
9	Practice in CPL	0.828
10	Effort regulation in CPL	0.850
11	Problem solving in CPL	0.797
12	Monitoring comprehension when solving CP problems	0.692
13	Peer learning in CPL	0.851

CP: Computer Programming.

CPL: Computer Programming Learning.

The subscales with the highest Cronbach’s alpha values, around 0.8, consist of items that have remained consistent and unchanged since the first application of the instrument. These subscales include “Study environment management”, “time management”, “effort regulation”, “practice in CPL”, and “peer learning in CPL”. The subscale “Metacognition” also received a value of 0.893, which places it in this group. The high alpha values indicate a high degree of consistency among the items within these subscales.

Subscales with Cronbach’s alpha values around 0.7 emerged as new subscales in either the first or second version of the instrument. For example, the subscale “Consult documentation” emerged from the first application, and the subscale “Coding” emerged from the second version. In addition, the “Organizing Ideas” subscale, although considered from the first version, had some items that were originally part of the preliminary “Problem Solving - Design” subscale. The “Problem Solving” subscale was formed by grouping items related to analysis, design, coding, and testing strategies during programming problem solving.

Finally, subscales with Cronbach’s alpha values around 0.6 were not explicitly considered in the initial research, but emerged from the results of the EFA conducted in the second application. These subscales include “Problem-based learning in computer programming for epistemological competencies” and “Monitoring comprehension when solving computer programming problems”. Furthermore, the subscale “Declarative knowledge acquisition” obtained the lowest alpha value of 0.662, but it is still considered acceptable according to the literature (Ursachi et al., 2015). It should be noted that the items of this subscale were originally grouped into different factors in the first version of the instrument. The adjustments made to the wording allowed these items to achieve the necessary consistency to characterize this type of computer programming learning strategies with the CEAPC.

Discussion

In the literature, several self-report instruments have been developed to assess learning processes from the perspective of self-regulated learning. However, most of these instruments have been designed to capture learning as a general process, without considering the specific characteristics of a particular knowledge domain, such as computer programming. During the course of this research, however, several studies were identified that made valuable contributions to the understanding of learning in computer programming. These studies focused on the design and use of self-report instruments to assess a specific aspect of self-regulated learning: motivation to learn (Cetin & Ozden, 2015; Danielsiek et al., 2017; Dorn & Elliott Tew, 2015; Ramalingam & Wiedenbeck, 1998; Tsai et al., 2019). Nevertheless, no instruments were found in the reviewed literature that specifically characterized the strategies used by computer programming students to improve their learning processes. By developing the Computer Programming Learning Strategies Questionnaire (CEAPC) as a self-report instrument, the research question of this study was answered: How can the learning strategies of computer programming students be characterized using a self-report questionnaire? This instrument addresses the need for an instrument (in Spanish-language) that can assess the learning strategies that are effective in the context of learning computer programming.

The CEAPC consists of thirteen subscales, comprising a total of 89 items, based on the self-regulated learning approach and focusing on strategies for learning to program. These subscales can be divided into two types: those that correspond to existing strategies documented in the literature and those that emerged as new strategies identified in this research. The first type of subscales includes:

• Consult documentation in CPL: The students who participated in this research emphasized the importance of consulting documentation about programming languages and libraries as an integral part of the programming learning process. They emphasized the importance of consulting documentation, particularly when coding solutions to programming problems. Many students reported relying on a variety of digital resources, including specialized websites, to access relevant information. Although the items comprising this subscale were not originally intended to be grouped together, the research findings confirmed the need to seek out specific information through digital tools (Garcia et al., 2018).

• Time management in CPL: This subscale explores the strategies students use to effectively plan their time for learning. As emphasized by Falkner et al. (2014), it is important to consider deadlines when solving programming problems and planning tasks.

• Practice in CPL: This subscale includes learning strategies that enable students to not only acquire knowledge, but also to develop expertise in solving programming problems. Based on student observations, these strategies are often used outside of the classroom to reinforce their learning through problem solving exercises. This finding is consistent with the views of authors such as Konecki (2014), who assert that the practice of programming requires effort, initiative, and consistent time commitment.

• Effort regulation in CPL: This subscale focuses on examining the amount of effort students put into learning programming. It covers various aspects, including the effort put into problem-solving activities, reviewing learning resources (including digital tools), and studying programming topics.

• Peer learning in CPL: According to Begel and Simon (2008), collaborating with peers in programming activities has several advantages, including the ability to collaborate on creating and editing source code, as well as shared problem-solving experiences. In the CEAPC, this subscale aims to examine whether students seek help from their peers when they encounter difficulties in solving programming problems or learning specific topics.

• Declarative knowledge acquisition in CP: In computer programming, a solid understanding of foundational concepts such as variables, control flow structures, functions, and others is crucial for delving deeper into specific topics and mastering the subject matter (Renumol et al., 2009). This subscale focuses on exploring the strategies students use to learn and acquire this foundational knowledge.

• Problem solving in CPL: According to authors such as Renumol et al. (2009) and Caruso et al. (2011), the process of learning to program goes beyond simply acquiring coding skills in a programming language. It involves the development of abstract and critical thinking skills to effectively solve programming problems. This problem-solving process typically consists of four stages: problem analysis, solution design, coding, and testing. The results of this research indicate that students perceive these stages as interconnected and interdependent, viewing them as a unified process while recognizing the unique characteristics and requirements of each stage in solving programming problems.

• Metacognition in CPL: The literature review and findings of this research support the understanding that metacognitive processes play a critical role in computer programming learning. Metacognitive processes, such as planning and monitoring learning progress and adapting learning methods, are used consistently throughout the computer programming learning journey (Falkner et al., 2014).

• Study environment management in CPL: The students who participated in this study emphasized the importance of creating an optimal learning environment for programming, especially when engaged in problem-solving or coding activities. Preferences for an ideal study environment varied among the participants. Some students expressed a preference for a quiet place that allows them to focus and concentrate better on their programming tasks. On the other hand, there were students who found a place with background music to be more conducive to their coding efforts.

In the second type of subscale that emerged unexpectedly, we identified the following:

• Organizing ideas in CPL: In both applications of the instrument, a factor emerged that grouped strategies related to organizing ideas. It was observed that students organize the main ideas of programming topics, structure the ideas that guide problem-solving designs, and organize ideas when documenting source code. This finding highlights the nuanced nature of organizing ideas in programming learning compared to other topics within the theory of self-regulated learning. In computer programming, organizing ideas includes not only conceptual information, but also arranging ideas to solve programming problems, such as using flowcharts or pseudocode.

• Coding in CPL: This subscale captures strategies students use during the process of coding a program’s source code using digital tools such as integrated development environments (IDEs) and online forums. The items also explore whether students test their programs once they are complete.

• Problem-based learning in CP for epistemological competences: This subscale emerged from the interpretation of item groupings from different subscales into a single factor. It reflects aspects of Problem-Based Learning (PBL) theory, particularly in relation to the acquisition of declarative knowledge. The model includes declarative knowledge, its application, the problem scenario, and the student as the recipient of knowledge (Kolmos et al., 2017).

• Monitoring comprehension when solving CP problems: This subscale consists of items related to the student’s use of reflection to understand the solution to a programming problem they are trying to solve, particularly by reviewing the source code.

Finally, it should be noted that the CEAPC represents a practical contribution in the field of learning strategies for computer programming. It uses a questionnaire to assess and characterize these strategies. While some of the strategies included in the instrument are in line with the existing literature on self-regulated learning, others have emerged directly from the observations and experiences of students learning computer programming. These strategies have been identified by students themselves as valuable and effective in learning programming. Of particular note are those strategies that utilize additional resources such as digital tools, reflecting the evolving nature of learning in the digital age. By including both established and emerging strategies, the CEAPC provides a comprehensive and practical tool for understanding and evaluating learning strategies in the context of computer programming.

Conclusions and Future Works

This paper described the design, construct validation, and reliability analysis of the Computer Programming Learning Strategies Questionnaire (CEAPC), a self-report instrument for characterizing learning strategies in computer programming. The questionnaire consists of 13 subscales, each representing a specific type of learning strategy, and includes a total of 89 items that assess the techniques used by students to engage in these strategies. The identification of programming learning strategies was based on a detailed literature review, following the framework proposed by Carretero-Dios and Pérez (2005) for the construction of instruments. The development of the CEAPC was influenced by existing self-report questionnaires, such as the MSLQ (Pintrich et al., 1993) and the MSLQ-Colombia (Ramírez-Echeverry et al., 2016), as well as Zimmerman’s categories of self-regulated learning, as redefined by Garcia et al. (2018).

The CEAPC has the potential to become a fundamental reference within the field of educational research in computer programming by providing researchers and practitioners with a tool to characterize the strategies that students use when learning computer programming. This instrument offers valuable insights into various aspects, such as the actions students take to solve programming problems, as well as their planning, monitoring, and controlling of learning processes inside and outside the classroom (metacognitive strategies), among other areas of inquiry. By using the CEAPC, researchers and practitioners can gain a deeper understanding of the learning processes involved in computer programming and identify effective strategies that contribute to students' success in this domain.

The research design used in this study employed mixed methods, combining quantitative and qualitative approaches. Quantitative methods were used to determine the dimensionality of the questionnaire, assess the construct validity of the instrument, and calculate the reliability of the questionnaire subscales using internal consistency measures. On the other hand, qualitative methods were used to refine the wording of the questionnaire items and to gain a deeper understanding of the quantitative results. Semi-structured interviews allowed students to share their experiences regarding the learning strategies they use in computer programming. These qualitative insights were valuable in understanding the emergence of unexpected factors in the exploratory factor analysis, which led to the identification of emergent subscales in the questionnaire. These subscales grouped learning strategies that were not initially anticipated based on the existing literature, indicating unique approaches used by students in the context of computer programming learning.

As future work, it is recommended to expand the participant population to increase diversity and generalize the results beyond the specific context of the study. This expansion would help to further validate the internal structure of the questionnaire, refine item wording, and improve the reliability of the instrument. By including diverse populations, researchers can gather more comprehensive evidence and increase the generalizability of the findings. This line of work will contribute to the possibility of generalizing the results presented in this study to a larger scale. In addition, another limitation of this study is related to the length of time it takes to administer the CEAPC, which can be time consuming (between 45 and 60 minutes). This may be a problem for students who are under time constraints.

Moreover, future studies should explore related concepts that may influence cognitive control processes when learning programming. The use of instruments such as the CEAPC can help to characterize the learning strategies that students use, which in conjunction with information on students’ knowledge of programming, can help to better understand the cognitive control processes in this area of research. By combining data on learning strategies with assessments of students' programming knowledge, researchers can explore how these strategies interact with cognitive processes and knowledge acquisition in computer programming. This integration of information can provide valuable insights into the complex interplay between learning strategies, cognitive control, and programming proficiency, and enhance our understanding of effective instructional approaches and interventions in computer programming education.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Felipe Restrepo-Calle

Author Biographies

Stephanie Torres Jiménez received her M.Sc. in Systems and Computing Engineering from the Universidad Nacional de Colombia in 2021, and her undergraduate degree in Systems and Computing Engineering from the same university in 2018. She currently works in the software industry in Colombia.

Jhon Jairo Ramírez-Echeverry received his Bachelor’s degree in Electronics Engineering from the Universidad Nacional de Colombia, Manizales (Caldas), Colombia, the MSc degree in Telecommunications Engineering from the Universidad Nacional de Colombia, Bogotá, Colombia, and the PhD degree (cum laude) in Engineering of Projects and Systems from the Universitat Politècnica de Catalunya, BarcelonaTech, Spain, in 2017. He is currently an Associate Professor in the Department of Electrical and Electronic Engineering at the Universidad Nacional de Colombia, Bogotá, Colombia. His research interests are engineering education (self-regulated learning) and electronic telecommunication systems.

Felipe Restrepo-Calle received his PhD (cum laude) from the University of Alicante, Spain, in 2011. He worked as a postdoctoral researcher at the University of Seville, Spain, in 2012 and 2013. Since 2014, he has been working at the Department of Systems and Industrial Engineering at the National University of Colombia (Universidad Nacional de Colombia), Bogotá, Colombia, where he is an associate professor and head of the Programming Languages and Systems (PLaS) research group. His research interests include programming languages, dependable design in embedded systems, and engineering education.

References

Bandura

(2006). Guide for constructing self-efficacy scales. In F

Pajares &

Urdan

(Eds.). Self-efficacy Beliefs of Adolescents. Greenwich, CT: Information Age Publishing, 5(1), 307–337 Available in: https://motivation.uky.edu/wp-content/blogs.dir/5/files/2022/08/BanduraGuide2006.pdf.

Begel

Simon

(2008). Novice software developers, all over again. ICER’08 - Proceedings of the ACM Workshop on International Computing Education Research, 1(425), 3–14. https://doi.org/10.1145/1404520.1404522

Bergin

Reilly

Traynor

(2005). Examining the role of self-regulated learning on introductory programming performance (pp. 81–86). First international workshop on computing education research. https://doi.org/10.1145/1089786.1089794

Biggs

J. B.

(1987). Study process questionnaire manual. https://files.eric.ed.gov/fulltext/ED308200.pdf

Carretero-Dios

Pérez

(2005). Normas para el desarrollo y revisión de estudios instrumentales. International Journal of Clinical and Health Psychology, 5(3), 521–551 Available in: http://www.redalyc.org/articulo.oa?id=33705307.

Caruso

Hill

Van DeGrift

Simon

(2011). Experience report: Getting novice programmers to THINK about improving their software development process. SIGCSE’11 - Proceedings of the 42nd ACM Technical Symposium on Computer Science Education, TX, Dallas, USA, March 9 - 12, 2011, 493–498. https://doi.org/10.1145/1953163.1953307

Castellanos

Restrepo-Calle

González

F. A.

Ramírez-Echeverry

(2017). Understanding the relationships between self-regulated learning and students source code in a computer programming course. IEEE Frontiers in Education Conference (FIE), Indianapolis, IN, USA, October 18 - 21, 2017, pp. 1–9, https://doi.org/10.1109/FIE.2017.8190467

Cetin

Ozden

M. Y.

(2015). Development of computer programming attitude scale for university students. Computer Applications in Engineering Education, 23(5), 667–672. https://doi.org/10.1002/cae.21639

Cheng

Poon

L. K.

Lau

W. W.

Zhou

R. C.

(2019). Applying eye tracking to identify students’ use of learning strategies in understanding program code. ACM International Conference Proceeding Series, 140–144. https://doi.org/10.1145/3345120.3345144

10.

Cicchinelli

Veas

Pardo

Pammer-Schindler

Fessl

Barreiros

Lindstädt

(2018). Finding traces of self-regulated learning in activity streams. Proceedings of the 8th international conference on learning analytics and knowledge - LAK ’18 (pp. 191–200). ACM Press. https://doi.org/10.1145/3170358.3170381

11.

Clement

C. A.

Kurland

D. M.

Mawby

Pea

R. D.

(1986). Analogical reasoning and computer programming. Journal of Educational Computing Research, 2(4), 473–486. https://doi.org/10.2190/dfh5-e0pg-1ml4-m34j

12.

Cruz Castro

L. M.

Magana

A. J.

Douglas

K. A.

Boutin

(2021). Analyzing students’ computational thinking practices in a first-year engineering course. IEEE Access, 9, 33041–33050. https://doi.org/10.1109/ACCESS.2021.3061277

13.

Danielsiek

Toma

Vahrenhold

(2017). An instrument to assess self-Efficacy in introductory algorithms courses. ACM. International Computing Education Research, 9(1), 56–65. https://doi.org/10.1145/3105726.3106171

14.

Dawes

(2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61–104. https://doi.org/10.1177/147078530805000106

15.

Demir

Seferoglu

S. S.

(2020). A comparison of solo and pair programming in terms of flow experience, coding quality, and coding achievement. Journal of Educational Computing Research, 58(8), 1–19. https://doi.org/10.1177/0735633120949788

16.

Dorn

Elliott Tew

(2015). Empirical validation and application of the computing attitudes survey. Computer Science Education, 25(1), 1–36. https://doi.org/10.1080/08993408.2015.1014142

17.

Eltegani

Butgereit

(2015). Attributes of students engagement in fundamental programming learning. Proceedings - 2015 International Conference on Computing, Control, Networking, Electronics and Embedded Systems Engineering ICCNEEE, 2015, 101–106. https://doi.org/10.1109/ICCNEEE.2015.7381438

18.

Entwistle

McCune

(2004). The conceptual bases of study strategy inventories. Educational Psychology Review, 16(4), 325–345. https://doi.org/10.1007/s10648-004-0003-0

19.

Escobar-Avila

Venuti

Di Penta

Haiduc

(2019). A survey on online learning preferences for computer science and programming. Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering Education and Training, ICSE-SEET 2019, 41, 170–181. https://doi.org/10.1109/ICSE-SEET.2019.00026

20.

Falkner

Vivian

Falkner

N. J.

(2014). Identifying computer science self-regulated learning strategies. Iticse 2014 - Proceedings of The 2014 Innovation and Technology in Computer Science Education Conference (pp. 291–296). ACM Press. http://dl.acm.org/citation.cfm?doid=2591708.2591715. https://doi.org/10.1145/2591708.2591715

21.

Garcia

Falkner

Vivian

(2018) Systematic literature review: Self-Regulated Learning strategies using e-learning tools for Computer Science. Computers and Education, 123, 150–163. https://doi.org/10.1016/j.compedu.2018.05.006

22.

González-Torres

M.-C.

Torrano

(2012). Methods and instruments for measuring self-regulated learning. In Valle

Nunez

J. C.

(Eds.), Handbook of instructional resources and applications. (chap. 12) Nova Science Publishers, Inc.

23.

Hair

Black

Babin

Anderson

Tatham

(2006). Multivariate data analysis (6th ed.). Pearson Prentice Hall.

24.

Juarez-Ramirez

Navarro

C. X.

Tapia-Ibarra

Macias-Olvera

Guerra-Garcia

(2018). What is programming? Putting all together - a set of skills required. 6th international conference in software engineering research and innovation (CONISOFT), San Luis Potosí, México, 24-26 October 2018, pp. 11–20, https://doi.org/10.1109/CONISOFT.2018.8645956

25.

Kolmos

de Graaff

(2017). Diversidad del PBL: Principios y modelos de aprendizaje. In Rodriguez-Mesa

Kolmos

Guerra

(Eds.), Aprendizaje basado en problemas en ingeniería: Teoría y práctica (pp. 39–57). Aalborg Universitetsforlag. http://vbn.aau.dk/files/262849868/Latin_Case_online.pdf

26.

Konecki

(2014). Problems in programming education and means of their improvement (pp. 459–470). DAAAM International Scientific Book. https://www.daaam.info/Downloads/Pdfs/science_books_pdfs/2014/Sc_Book_2014-037.pdf

27.

Loksa

A. J.

(2016). The role of self-regulation in programming problem solving process and success. Proceedings of the 2016 ACM conference on international computing education research - ICER ’16 (pp. 83–91). ACM Press. https://dl.acm.org/citation.cfm?doid=2960310.296033410.1145/2960310.2960334

28.

McDougall

Boyle

Pellas

Peroutseas

(2016). Student strategies for learning computer programming: Implications for pedagogy in informatics. Journal of Educational Computing Research, 54(2), 109–116. https://doi.org/10.1023/B:EAIT.0000027924.69726.b5

29.

McGill

T. J.

Volet

S. E.

(1997). A conceptual framework for analyzing students’ knowledge of programming. Journal of Research on Computing in Education, 29(3), 276–297. https://doi.org/10.1080/08886504.1997.10782199

30.

Nassar-McMillan

S. C.

Wyer

Oliver-Hoyo

Ryder-Burge

(2010). Using focus groups in preliminary instrument development: Expected and unexpected lessons learned. The Qualitative Report, 15(6), 1621–1634. DOIhttps://doi.org/10.46743/2160-3715/2010.1368.

31.

Park

Kim

Robertson

Kim

(2019). The effect of memorization on the retention and learning acquisition of programming practice. Journal of Strategic Innovation and Sustainability, 14(2), 129–135. https://doi.org/10.33423/jsis.v14i2.1378

32.

Pintrich

P. R.

Smith

D. A. F.

Garcia

Mckeachie

W. J.

(1993). Reliability and predictive validity of the motivated strategies for learning questionnaire (MSLQ). Educational and Psychological Measurement, 53(3), 801–813. https://doi.org/10.1177/0013164493053003024

33.

Prather

Pettit

Becker

B. A.

Denny

Loksa

Peters

Albrecht

Masci

(2019). First things first: Providing metacognitive scaffolding for interpreting problem prompts. Proceedings of the 50th ACM Technical Symposium on Computer Science Education - SIGCSE ’19, MN, Minneapolis, USA, February 27 - March 2, 2019, 531–537. https://doi.org/10.1145/3287324.3287374

34.

Ramalingam

Wiedenbeck

(1998). Development and validation of scores on a computer programming self-efficacy scale and group analyses of novice programmer self-efficacy. Journal of Educational Computing Research, 19(4), 367–381. https://doi.org/10.2190/c670-y3c8-ltj1-ct3p

35.

Ramírez-Echeverry

J. J.

García-Carrillo

À.

Olarte Dussán

F. A.

(2016). Adaptation and validation of the motivated strategies for learning questionnaire - MSLQ - in engineering students in Colombia. International Journal of Engineering Education, 32(4), 1774–1787 Available in: http://hdl.handle.net/2117/107554.

36.

Ramírez-Echeverry

J. J.

Rosales-Castro

L. F.

Restrepo-Calle

González

F. A.

(2018). Self-regulated learning in a computer programming course. IEEE Revista Iberoamericana de Tecnologias del Aprendizaje, 13(2), 75–83. https://doi.org/10.1109/RITA.2018.2831758

37.

Renumol

Jayaprakash

Janakiram

(2009). Classification of cognitive difficulties of students to learn computer programming (p. 12). Indian institute of technology.

38.

Schellings

(2011). Applying learning strategy questionnaires: Problems and possibilities. Metacognition and Learning, 6(2), 91–109. https://doi.org/10.1007/s11409-011-9069-5

39.

Schraw

Dennison

R. S.

(1994). Assessing metacognitive awareness. Contemporary Educational Psychology, 19(4), 460–475. https://doi.org/10.1006/ceps.1994.1033

40.

Smrithi

V. R.

Venkatapathy

(2015). Understanding the usage of online forums as learning platforms. Procedia Computer Science, 46, 499–506. https://doi.org/10.1016/j.procs.2015.02.074

41.

Tek

F. B.

Benli

K. S.

Deveci

(2018). Implicit theories and self-efficacy in an introductory programming course. IEEE Transactions on Education, 61 (3), 218–225. https://arxiv.org/abs/1710.11559. https://doi.org/10.1109/TE.2017.2789183

42.

Tsai

C.-Y.

(2019). Improving students’ understanding of basic programming concepts through visual programming language: The role of self-efficacy. Computers in Human Behavior, 95, 224–232. https://doi.org/10.1016/j.chb.2018.11.038

43.

Tsai

M.-J.

Wang

C.-Y.

Hsu

P.-F.

(2019). Developing the computer programming self-efficacy scale for computer literacy education. Journal of Educational Computing Research, 56(8), 1345–1360. https://doi.org/10.1177/0735633117746747

44.

Ursachi

Horodnic

I. A.

Zait

(2015). How reliable are measurement scales? External factors with indirect influence on reliability estimators. Procedia Economics and Finance, 20, 679–686. https://doi.org/10.1016/s2212-5671(15)00123-9

45.

Watson

J. C.

(2017). Establishing evidence for internal structure using exploratory factor analysis. Measurement and Evaluation in Counseling and Development, 50(4), 232–238. https://doi.org/10.1080/07481756.2017.1336931

46.

Weinstein

Mayer

(1986). The teaching of learning strategies. In Wittrock

(Ed.), Handbook of research on teaching (pp. 315–327). Macmillan.

47.

Weinstein

C. E.

Zimmermann

S. A.

Palmer

D. R.

(1988). Assessing learning strategies: The design and development of the LASSI. In Weinstein

C. E.

Goetz

E. T.

Alexander

P. A.

(Eds), Learning and study strategies (pp. 25–40). Academic Press.

48.

Weintrop

Beheshti

Horn

Orton

Jona

Trouille

Wilensky

(2016). Defining computational thinking for mathematics and science classrooms. Journal of Science Education and Technology, 25(1), 127–147. https://doi.org/10.1007/s10956-015-9581-5

49.

Winne

P. H.

Perry

N. E.

(2000). Measuring Self-Regulated Learning. In: Boekaerts

Pintrich

P. R.

Zeidner

(Eds.), Handbook of self-regulation (p. 531–566). Academic Press. https://www.sciencedirect.com/science/article/pii/B9780121098902500457. https://doi.org/10.1016/B978-012109890-2/50045-7

50.

Zimmerman

(1989). A social cognitive view of self-regulated academic learning. Journal of Educational Psychology, 81(3), 329–339. https://doi.org/10.1037/0022-0663.81.3.329

The Development and Validation of the Questionnaire to Characterize Learning Strategies in Computer Programming (CEAPC)

Abstract

Keywords

Introduction

Related Works

Materials and Methods

Research Design

Stage I: Identification of Subscales

Stage II: Design and Construction of Items

Focus Groups

Stage III: Initial Exploration - Analysis of the Internal Structure of the Preliminary Version of the Instrument

Participants

Internal Structure of the Preliminary Version of the Instrument

Semi-Structured Interviews

Adjustments to the Wording of Items or Deletion of Items

Stage IV: Final Exploration - Analysis of the Internal Structure of the Final Version of the Instrument

Participants

Internal Structure of the Second Version of the Instrument

Deletion of Items From the Second Version of the Instrument

Final Version of the Instrument

Subscales of the Instrument With Conceptual Description

Internal Structure of the CEAPC

Reliability of the CEAPC

Discussion

Conclusions and Future Works

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Author Biographies

References