Neural code generation / Spring 2024


This course is on modeling and synthesizing programs using deep-learning methods, with an emphasis on neural language models. The key goals of this course are to (1) teach students the foundations of constructing a modern neural code generation system based on neural language models, including formal-method aware fine-tuning, inference, and evaluation methods, and (2) explore new research in neural code generation, including improving interaction with human programmers, model reliability and adaptability, and applications to engineering, reasoning, and formal verification.



Teaching Assistants

Logistics

  • Class times: TR 3:30pm - 4:50pm
  • Room: WEH 4625
  • Course identifier: LTI 11891
  • Office hours: See Piazza

Course description

The course covers algorithmic foundations, practical aspects, and open research frontiers in neural code generation. The course is separated into two high-level parts: (1) algorithmic and practical aspects of neural language models for code generation, (2) open research frontiers in neural code generation. Emphasis is placed on the unique aspects of modeling and generating code compared to other sequential data (e.g., text alone).

This course is intended for 2nd year (or higher) graduate students. Familiarity with neural language models is recommended.

Class format

Each class is either an instructor-led lecture or a student-led discussion about a (set of) research paper(s).

Student-led discussions

In a student-led discussion, a group of 3 students presents a set of papers that are on a common theme. Presenters should decide how much time to devote to each paper, and whether to include background information. The decision is based on their interests and the clarity of the presentation. The primary goal is to structure a discussion with other members of the class. The presentation should include the topics below. Not every paper needs to be included for each topic, but the presentation should be a coherent whole. We have some suggested times below to aim for, including discussion time, but these are guidelines and if you don’t get to every topic because of an in-depth discussion that’s totally fine! We’ll grade based on your slides, too.
  • Content: Summarize the content of the papers (motivation, problem definitions, methods, key findings). Include discussion points during the presentation. For instance, are there areas you found surprising, particularly interesting/creative, or difficult to understand? [aim for ~20 minutes, including discussion]
  • Review: Imagine that the papers were submitted to a conference, and you are a reviewer. For at least one paper, make a slide with an overall score and discuss why you gave the paper the score. The intent is to start a discussion with the class. A lively discussion and debate is encouraged! [~20 minutes, including discussion]
  • Future: List some future research ideas related to the themes or ideas covered in the papers. The intent is to start a discussion about outgoing research paths related to the papers. [~10 minutes, including discussion]
  • Reproducibility: Collect resources (e.g., code, data, if any) that might be useful if you wanted to dig deeper. As a starting point for discussion, highlight areas that may be easy or difficult to reproduce. [~10 minutes, including discussion]
Presentations are evaluated on a 10 point scale: 2 points for each point above, and 2 additional points for the quality of discussion questions that are spread throughout the presentation.

We will have a shared pool of slides hosted on Google presentations, which will be shared a week before. Please prefix the slide title with the lecture number so that the slides are easily identified.

For non-presenting students:

  • Pre-assignment: prepare a short summary of one of the class's papers, and at least one discussion question related to the paper. The summary should briefly summarize the motivation, research contributions, method, and at least one key experimental result from the paper (e.g., 1-3 sentences for each of these items). We’ll discuss some of the submitted discussion questions during the class. The summary and question are due at 11:59pm on the day before class.
  • Discussion: non-presenting students are expected to participate in the discussions.
  • Post-assignment: write about one thing that you found interesting from the discussion (2-3 sentences). Refer to a specific part of the discussion in your answer. The post-assignment is due at 11:59pm on the day of class.

Instructor-led lectures

  • Pre-assignment: we will post a list of papers that are relevant to the lecture. Pick one of these, and write a summary and one discussion question as discussed above. We’ll discuss some of the submitted discussion questions during the class. The summary and question are due at 11:59pm on the day before class.
  • Attendance: students are expected to attend lectures.
  • Post-assignment: write about one thing that you found interesting from the lecture (2-3 sentences). Refer to a specific part of the lecture in your answer. The post-assignment is due at 11:59pm on the day of class.

Schedule

Course project

The course can either be taken without a course project for 6 credits, or with a course project for 12 credits.

The course project simulates a research project on a topic related to the course. This includes (but is not limited to) analysis of a method or paper, replicating a paper's analysis or methods, developing a related method, evaluating a method in a new setting, or other forms of investigation. Project groups are encouraged to propose their own ideas. The instructors will additionally provide a list of ideas that groups can choose from.
  • Project teams: each project team consists of 2-4 members.
  • Mid-semester review: project teams will meet with the instructors roughly half way through the semester. Each team will prepare a presentation about the current project status. Instructors will discuss next steps with the team and provide feedback.
  • Final review and writeup: project teams will present their project to the class at the end of the semester. Additionally, each team will prepare a writeup in the style of a workshop paper. The paper describes the project, its current results, challenges faced, and the project's future outlook.
NOTE: Course project logistics may be subject to revision prior to the course.

Resources

Grading

In-Class Component (applies to 6- and 12-unit versions)

The in-class component of the course is graded on a 60 point scale, with 2/3 assignments and 1/3 presentation. Specifically:
  • 20 points for discussion and lecture pre-assignments. There are 23 assignments, and each assignment is worth 1 point. You automatically receive a score of 1 for the class in which you are presenting. We will drop the lowest 3 assignment scores out of the 23 assignments to calculate the 20 points.
  • 20 points for discussion and lecture post-assignments. There are 23 assignments, and each assignment is worth 1 point. You automatically receive a score of 1 for the class in which you are presenting. We will drop the lowest 3 assignment scores out of the 23 assignments to calculate the 20 points.
  • 20 points for student presentation. The presentation is graded on a 10 point scale: 2 points each for covering content, review, future, and reproducibility, and 2 points for the quality of discussion questions that are prompted in the presentation. This total score will be multiplied by 2 to get 20 overall points. For each aspect, 0 points means missing/low quality, 1 passable quality, and 2 high quality. For examples of high quality, please see presentations from discussion-based courses such as Princeton COS 597G, or JHU CSCI 601.771.

Project Component (applies to 12-unit version)

For students in the 12-unit version of the course, the in-class component (above) will be scaled down by 50%, and the other 50% of the grade will come from a course project with the following grade breakdown. Projects will be in teams of 2-4 students, which do not need to be the same as the discussion presentation groups.
  • Report 1 (due Fri Mar 1): 25% This will contain a task proposal and analysis, related work, and baseline proposal.
  • Report 2 (due Tue Apr 2): 25% This will contain baseline approaches, results and analysis of the baselines, and an updated proposal for techniques to explore in the final part of the project.
  • Final Report (due Tue Apr 30): 30% Extends R1 and R2 with results and analysis from additional techniques explored, and discussion of directions for future work. This should be in the style of a workshop paper.
  • Final presentation (Apr 23 and 25th): 10% These will be in-class presentations to the course staff and other students.
  • Project hours (Feb 22nd and Mar 28th): 10% Each group will spend ~15-20 minutes meeting with an instructor, in-class. Teams will come with a few slides on a laptop about the current status of the project and next steps and use these to get feedback. Instructors will grade the updates, with 5% per project hour meeting.

Late policies

Your team will have a total of 5 late days which you can budget across Report 1, Report 2, and the Final Report. Note that other than these late days, we will not be making exceptions and extending deadlines except for health reasons, so please try to be frugal with your late days and use them only if necessary. Assignments that are late beyond the allowed late days will be graded down one third-grade per day late (e.g. A to A- for one day, and A to B+ for two days).