References

Abelson, H. (1986). Lecture 1A: Overview and introduction to lisp [lecture transcript]. MIT OpenCourseWare 6.001 Structure and Interpretation of Computer Programs. https://ocw.mit.edu/courses/6-001-structure-and-interpretation-of-computer-programs-spring-2005/resources/1a-overview-and-introduction-to-lisp/

Arshad, A., Ghaleb, T., & Ralph, P. (2021). Towards a more structured peer review process with empirical standards. Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering, 353–358. https://doi.org/10.1145/3463274.3463359

Arvanitou, E.-M., Ampatzoglou, A., Chatzigeorgiou, A., & Carver, J. C. (2021). Software engineering practices for scientific software development: A systematic mapping study. Journal of Systems and Software, 172, 915–929. https://doi.org/10.1016/j.jss.2020.110848

Barua, A., Thomas, S. W., & Hassan, A. E. (2014). What are developers talking about? An analysis of topics and trends in Stack Overflow. Empirical Software Engineering, 19(3), 619–654. https://doi.org/10.1007/s10664-012-9231-y

Begel, A., & Zimmermann, T. (2014). Analyze this! 145 questions for data scientists in software engineering. Proceedings of the 36th International Conference on Software Engineering, 12–23. https://doi.org/10.1145/2568225.2568233

Beller, M., Spruit, N., Spinellis, D., & Zaidman, A. (2018). On the dichotomy of debugging behavior among programmers. Proceedings of the 40th International Conference on Software Engineering, 572–583. https://doi.org/10.1145/3180155.3180175

Blackburn, S. M. et al. (2006). The DaCapo benchmarks: Java benchmarking development and analysis. Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, 169–190. https://doi.org/10.1145/1167473.1167488

Boehm, B. W., Elwell, J. F., Pyster, A. B., Stuckle, E. D., & Williams, R. D. (1982). The TRW software productivity system. Proceedings of the 6th International Conference on Software Engineering, 148–156. https://dl.acm.org/doi/10.5555/800254.807757

Booth, W. C., Colomb, G. G., Williams, J. M., Bizup, J., & FitzGerald, W. T. (2016). The craft of research (4th ed.). University of Chicago Press.

Burns, R. B. (2000). Introduction to research methods (4th ed.). SAGE Publications.

Carrera-Rivera, A., Ochoa, W., Larrinaga, F., & Lasa, G. (2022). How-to conduct a systematic literature review: A quick guide for computer science research. MethodsX, 9, 101895. https://doi.org/10.1016/j.mex.2022.101895

Carvalho, L., Degiovanni, R., Cordy, M., Aguirre, N., Le Traon, Y., & Papadakis, M. (2024). SpecBCFuzz: Fuzzing LTL solvers with boundary conditions. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. https://doi.org/10.1145/3597503.3639087

Choudhuri, R., Liu, D., Steinmacher, I., Gerosa, M., & Sarma, A. (2024). How far are we? The triumphs and trials of generative AI in learning software engineering. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. https://doi.org/10.1145/3597503.3639201

Claes, M., Mäntylä, M. V., Kuutila, M., & Adams, B. (2018). Do programmers work at night or during the weekend? Proceedings of the 40th International Conference on Software Engineering, 705–715. https://doi.org/10.1145/3180155.3180193

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). Sage.

Denning, P. J. (2005). Is computer science science? Commun. ACM, 48(4), 27–31. https://doi.org/10.1145/1053291.1053309

Dubey, R. K., Thrash, T., Kapadia, M., Hoelscher, C., & Schinazi, V. R. (2021). Information theoretic model to simulate agent-signage interaction for wayfinding. Cognitive Computation, 13(1), 189–206.

Easterbrook, S., Singer, J., Storey, M.-A., & Damian, D. (2008). Selecting empirical methods for software engineering research. In F. Shull, J. Singer, & D. I. K. Sjøberg (Eds.), Guide to advanced empirical software engineering (pp. 285–311). Springer London. https://doi.org/10.1007/978-1-84800-044-5_11

Futatsugi, K., & Okada, K. (1982). A hierarchical structuring method for functional software systems. Proceedings of the 6th International Conference on Software Engineering, 393–402. https://dl.acm.org/doi/10.5555/800254.807782

Gray, J. (1992). Benchmark handbook: For database and transaction processing systems. Morgan Kaufmann Publishers Inc.

Habiba, U.-., Habib, M. K., Bogner, J., Fritzsch, J., & Wagner, S. (2024). How do ML practitioners perceive explainability? An interview study of practices and challenges. Empirical Softw. Engg., 30(1). https://doi.org/10.1007/s10664-024-10565-2

Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304. https://doi.org/10.1109/TSE.2011.103

Hoda, R., Noble, J., & Marshall, S. (2013). Self-organizing roles on agile software development teams. IEEE Transactions on Software Engineering, 39(3), 422–444. https://doi.org/10.1109/TSE.2012.30

Huang, Y., Wang, J., Liu, Z., Wang, Y., Wang, S., Chen, C., Hu, Y., & Wang, Q. (2024). CrashTranslator: Automatically reproducing mobile application crashes directly from stack trace. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering. https://doi.org/10.1145/3597503.3623298

Huijgens, H., Rastogi, A., Mulders, E., Gousios, G., & Deursen, A. van. (2020). Questions for data scientists in software engineering: A replication. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 568–579. https://doi.org/10.1145/3368089.3409717

Huppler, K. (2009). The art of building a good benchmark. In R. Nambiar & M. Poess (Eds.), Performance evaluation and benchmarking (pp. 18–30). Springer Berlin Heidelberg.

Inal, Y., Clemmensen, T., Rajanen, D., Iivari, N., Rizvanoglu, K., & Sivaji, A. (2020). Positive developments but challenges still ahead: A survey study on UX professionals’ work practices. J. Usability Studies, 15(4), 210–246.

Inayat, I., Salim, S. S., Marczak, S., Daneva, M., & Shamshirband, S. (2015). A systematic literature review on agile requirements engineering practices and challenges. Computers in Human Behavior, 51, 915–929. https://doi.org/10.1016/j.chb.2014.10.046

Jedlitschka, A., & Pfahl, D. (2005). Reporting guidelines for controlled experiments in software engineering. 2005 International Symposium on Empirical Software Engineering, 2005., 1–10. https://doi.org/10.1109/ISESE.2005.1541818

Kabir, S., Udo-Imeh, D. N., Kou, B., & Zhang, T. (2024). Is Stack Overflow obsolete? An empirical study of the characteristics of ChatGPT answers to Stack Overflow questions. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3613904.3642596

Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D. M., & Damian, D. (2014). The promises and perils of mining GitHub. Proceedings of the 11th Working Conference on Mining Software Repositories, 92–101. https://doi.org/10.1145/2597073.2597074

Kampenes, V. B., Dybå, T., Hannay, J. E., & Sjøberg, D. I. K. (2007). A systematic review of effect size in software engineering experiments. Information and Software Technology, 49(11), 1073–1086. https://doi.org/10.1016/j.infsof.2007.02.015

Kazemi, M. et al. (2025). BIG-bench extra hard. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 26473–26501. https://doi.org/10.18653/v1/2025.acl-long.1285

Keshav, S. (2007). How to read a paper. SIGCOMM Comput. Commun. Rev., 37(3), 83–84. https://doi.org/10.1145/1273445.1273458

Kitchenham, B. A., Dyba, T., & Jorgensen, M. (2004). Evidence-based software engineering. Software Engineering, 2004. ICSE 2004. Proceedings. 26th International Conference on, 273–281. https://doi.org/10.1109/ICSE.2004.1317449

Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering (Technical Report EBSE-2007-01). Keele University; Durham University Joint Report. https://legacyfileshare.elsevier.com/promis_misc/525444systematicreviewsguide.pdf

Kounev, S., Lange, K.-D., & Kistowski, J. von. (2025). Systems benchmarking: For scientists and engineers (2nd ed.). Springer. https://doi.org/10.1007/978-3-031-85634-1

Krause, A., Kaur, H., Klemmer, J. H., Wiese, O., & Fahl, S. (2025). “That’s my perspective from 30 years of doing this”: An interview study on practices, experiences, and challenges of updating cryptographic code. 34th USENIX Security Symposium, 2907–2926.

Lamport, L. (2012). How to write a 21st century proof. Journal of Fixed Point Theory and Applications, 11(1), 43–63.

Lawrance, J., Bogart, C., Burnett, M., Bellamy, R., Rector, K., & Fleming, S. D. (2013). How programmers debug, revisited: An information foraging theory perspective. IEEE Transactions on Software Engineering, 39(2), 197–215. https://doi.org/10.1109/TSE.2010.111

Miao, X., Wu, Y., Chen, L., Gao, Y., & Yin, J. (2023). An experimental survey of missing data imputation algorithms. IEEE Transactions on Knowledge and Data Engineering, 35(7), 6630–6650. https://doi.org/10.1109/TKDE.2022.3186498

Munaiah, N., Kroh, S., Cabrey, C., & Nagappan, M. (2017). Curating GitHub for engineered software projects. Empirical Software Engineering, 22(6), 3219–3253. https://doi.org/10.1007/s10664-017-9512-6

Nakamoto, Y., Iwamoto, T., Hori, M., Hagihara, K., & Tokura, N. (1982). An editor for documentation in π-system to support software development and maintenance. Proceedings of the 6th International Conference on Software Engineering, 330–339. https://dl.acm.org/doi/10.5555/800254.807775

OECD. (2015). Frascati manual 2015: Guidelines for collecting and reporting data on research and experimental development (p. 398). OECD Publishing. https://doi.org/10.1787/9789264239012-en

Park, J. S., O’Brien, J., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023). Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. https://doi.org/10.1145/3586183.3606763

Rothlisberger, D., Harry, M., Binder, W., Moret, P., Ansaloni, D., Villazon, A., & Nierstrasz, O. (2012). Exploiting dynamic information in IDEs improves speed and correctness of software maintenance tasks. IEEE Transactions on Software Engineering, 38(3), 579–591. https://doi.org/10.1109/TSE.2011.42

Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., & Jinks, C. (2018). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality & Quantity, 52(4), 1893–1907. https://doi.org/10.1007/s11135-017-0574-8

Shahin, M., Liang, P., & Babar, M. A. (2014). A systematic review of software architecture visualization techniques. Journal of Systems and Software, 94(Supplement C), 161–185. https://doi.org/10.1016/j.jss.2014.03.071

Shreeve, B., Gralha, C., Rashid, A., Araújo, J., & Goulão, M. (2023). Making sense of the unknown: How managers make cyber security decisions. ACM Trans. Softw. Eng. Methodol., 32(4). https://doi.org/10.1145/3548682

Steimann, F. (2018). Fatal abstraction. Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, 125–130. https://doi.org/10.1145/3276954.3276966

Stol, K.-J., & Fitzgerald, B. (2018). The ABC of software engineering research. ACM Trans. Softw. Eng. Methodol., 27(3). https://doi.org/10.1145/3241743

Tabassum, S., Pereira, F. S. F., Fernandes, S., & Gama, J. (2018). Social network analysis: An overview. WIREs Data Mining and Knowledge Discovery, 8(5), e1256. https://doi.org/https://doi.org/10.1002/widm.1256

Vidoni, M. (2022). A systematic process for mining software repositories: Results from a systematic literature review. Inf. Softw. Technol., 144(C). https://doi.org/10.1016/j.infsof.2021.106791

Wobbrock, J. O., & Kientz, J. A. (2016). Research contributions in human-computer interaction. Interactions, 23(3), 38–44. https://doi.org/10.1145/2907069

Wohlin, C., & Aurum, A. (2015). Towards a decision-making structure for selecting a research design in empirical software engineering. Empirical Softw. Engg., 20(6), 1427–1455. https://doi.org/10.1007/s10664-014-9319-7

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2024). Systematic literature studies. In Experimentation in software engineering (2nd ed., pp. 51–63). Springer. https://doi.org/10.1007/978-3-662-69306-3_4

Yang, D., Martins, P., Saini, V., & Lopes, C. (2017). Stack Overflow in GitHub: Any snippets there? 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), 280–290. https://doi.org/10.1109/MSR.2017.13

Yang, X., Lo, D., Xia, X., Zhang, Y., & Sun, J. (2015). Deep learning for just-in-time defect prediction. 2015 IEEE International Conference on Software Quality, Reliability and Security, 17–26. https://doi.org/10.1109/QRS.2015.14

Zeller, A., & Lütkehaus, D. (1996). DDD—a free graphical front-end for UNIX debuggers. SIGPLAN Not., 31(1), 22–27. https://doi.org/10.1145/249094.249108

Zobel, J. (2014). Writing for computer science. Springer. https://doi.org/10.1007/978-1-4471-6639-9