Contributing to Houdini

🎉🎉 First off, thanks for the interest in contributing to Houdini! 🎉🎉

This document should hopefully provide some guidance for working on the project including some tips for local development as well as an introduction to the internal architecture and the relevant files/directories.

Note: this document contains links to files and sometimes specific lines of code which could easily be invalidated with future work. If you run into a link that’s broken or doesn’t look right, please open a PR that fixes it. Keeping documentation up to date is as important as any bug fix or new feature.

Local Development

The quickest way to test and develop new features is by using the end-to-end tests. Starting with pnpm i && pnpm build at the root of the repository will handle the details of linking everything up. Once pnpm is done, run pnpm dev inside of the e2e/kit directory to start both the web app & API development servers. After all of this, you should be able to visit localhost:5173 in a web browser and see the end-to-end test suite. We recommend creating a route in this application to work against. Don’t worry about where it “belongs” - we’ll make sure it goes in the right place when your PR is open.

General Introduction

At a high level, houdini is broken up into a few parts. The core houdini project is located at packages/houdini which provides the core artifact generation pipeline, cache runtime, vite plugin, and a collection of utilities for building extensions to the system. Apart from that core, Houdini has framework specific bindings that take the artifacts and cache generated by the core package and use them to deliver an experience tailored to the specific framework. Chances are, if you are reading this, you care most about the svelte (and kit) bindings. Those live in packages/houdini-svelte. That packages provides a plugin that extends the core houdini package to support the API you are using (it generate stores, transform files, generate loads for kit, etc).

Code Generation

Houdini’s code generation lives in packages/houdini/src/codegen. There are a few different places that use the function exported from this directory but this directory is ultimately responsible for generating the artifacts that describe every document in a project (identified as strings tagged with graphql). These artifacts not only save the runtime from parsing the user’s documents but also enable core features such as compiling fragments and queries into a single string that can be sent to the API. The codegen pipeline is built out of tasks that operate on the strings found in a project. These tasks fall into three categories:

Internal GraphQL Schema

There are a number of features that rely on things that aren’t defined in the project’s schema. Most of these are added temporarily by the schema transform, and are eventually removed from the document to prevent the server from encountering anything unknown. The fragments used for list mutations are currently generated in a separate transform. Since the operation fragments are passed along to the server as part of the collectDefinitions transform they don’t need to be removed and are used to make sure the server returns the data needed for the operation. Whether they are removed from the final query or not, the artifact generator looks for these internal schema elements to encode additional information in the document’s artifact that tells the runtime how to handle the response from the server.

Document Artifacts

The logic for constructing the document artifacts is done by generating a javascript abstract syntax tree (AST) and printing it before writing the result to disk. This is done using the awesome recast library but can still be tricky to get right. The Online AST Explorer is incredibly useful for figuring out the right objects to leave behind that will result in the desired code.

It’s sometimes helpful to look at the shape of the artifacts that the generate command produces. Rather than outlining every field contained in an artifact (which would likely go stale quickly) we recommend looking at the artifact snapshot tests to see what is generated in various situations. At a high level, the raw field is used when sending actual queries to the server and the selection field is structured to save the runtime from wasting cycles (and bundle size) on parsing and “understanding” what the user wants when they use a specific document. For more information about how these are used, see the cache section.

The Vite Plugin

The base vite plugin is defined in packages/houdini/src/vite/index.ts and takes care of a few tasks such as polling the API for schema changes. The meat of your experience is defined in the svelte plugin as a pipeline that looks at every string tagged with graphql and mutates it into something the runtime can use. It is built out of a few different plugins that work together to deliver a seamless dev experience.

Faking +page.js

Once of the core pieces of the plugin is the ability to fake the existence of +page.js in the eyes of vite and sveltekit. This logic lives in the fsPatch plugin and requires two things. First, we have to patch node:fs so that vite and kit think the file exists if necessary. And second, vite has to be configure so that an import for one of the fake +page.js With those two things in place, the transform pipeline treats the fake files as empty.

The Runtime

The actual runtime used by houdini is split into multiple parts. The part shared by all frameworks is defined in packages/houdini/src/runtime and contains the cache definition as well as some utilities for fetching data and other tasks that are shared by everyone.z

The Cache

As with most of this guide, the most reliable place to get an understanding of how the cache’s internals are organized is the test suite. However, here is a brief explanation of the overall architecture so you can orient yourself:

Houdini’s cache is built on top of two core interactions: writing data and subscribing to a given selection. In order for a value to be written to the cache, it must be given the the data along with schema information for the payload. In response, the cache walks down the result and stores the value of every field that it encounters in an object mapping the entity’s id to the set of field values. This data is stored in normal form, which means that references to other entities are not stored like scalar values but are instead stored as references to other entries in the same map. This gives us a single place where updates can be applied, without worrying about where that information is used. While walking down the provided selection, the cache looks for information embedded by the artifact generator to perform additional tasks like updating a list.

While writing data is an important part of the interaction with the cache, the real “meat” is in the subscription architecture which keeps the store returned by query (or fragment) up to date as values are changed. Just like when writing data, the cache must be given an object that describes the full selection of data that the store would like. However, it also needs a function to call when the data has changed. In practice, this function is just the set corresponding to the writable store powering a given query or fragment. With these two things, the cache walks down the provided selection and embeds a reference to the set function alongside the field values for a given object. When data is written to the cache, houdini looks at the values being updated, captures every set function that must be called, and invokes the function with an object matching the entire corresponding selection.

For a general introduction to normalized caching for GraphQL queries, check out the urql page on Normalized Caching, which gives a very good overview of the task, even if some of the actual implementation details differ from houdini’s.

End-to-End tests

The best way to ensure that your feature works is to create a test that simulates the user’s experience. We do this using a suite of end-to-end tests that verify some of the more complex behaviors. You can find these tests in the dedicated e2e folder at the root of Houdini’s repository.

The general idea is to create specific routes in a SvelteKit application that showcase a single behavior and then use (Playwright)[https://playwright.dev/] to simulate a user interacting with the UI and verify that things are working as expected.

Piecing It All Together

If you made it this far in the guide, you’re awesome - even if you just skipped ahead. Either way, when it’s time to start thinking about adding a feature to the codebase, you should start by asking yourself a few questions:

  1. Does the feature appear in the graphql documents that a developer will use? If so, you will need to think of a way to persist what the user types in the generated artifacts. Remember that the runtime will walk down the selection field when writing values to the cache and can look for special keys in order to perform arbitrary logic when dealing with a server’s response. Once you have the information persisted in the artifact, all that’s left is figuring out how the runtime will handle what’s there.
  2. Are there are any validation steps? They’re not there just to protect the user. They can also provide guarantees for the runtime that save you having to check a bunch of stuff when processing a server’s response.
  3. Can svelte provide any kind of help to the runtime? One of the benefits of generating the entire runtime is that the final code looks like any other code in a user’s project. This means things like reactive statements and life-cycle functions work out of the box.

Remember, an end-to-end feature for houdini will likely touch the artifact generator as well as the runtime (at the very least). It’s easy to get lost in how all of the pieces fit together. In order to help make things more clear, the implementation for list operations will be outlined in the following section.

An Example: List Operations

This section will contain links to exact lines of code in order to walk you through how the list operations are implemented and will likely fall out of line with the actual codebase. If you encounter an incorrect link, please open a PR to fix it.

There are two parts to this feature. First, a user marks a particular field as a valid target for operations:

query AllUsersQuery {
	users @list(name: "All_Users") {
		id
		firstName
	}
}

… and then uses a fragment in the mutation to update the list:

mutation AddUserMutation {
	addUser(firstName: "Alec") {
		...All_Users_insert
	}
}

The steps for updating the generate function to support this feature can be broken down into the following:

  1. Add the list directive to the projects schema. As mentioned earlier, this is done in the schema transform.
  2. Define the operation fragment somewhere that the collectDefinitions transform can pick it up to include in the mutation query when its sent to the server. This happens in the list transform.
  3. When generating the artifacts for the query, remove any references to the @list directive and leave behind a label identifying the field as the “All_Users” list. For a better idea of how this label is embedded in the artifact, look at the list filters test.
  4. When generating the artifact for the mutation, look for any fragment spreads that are list operations and and embed them in the selection object for the mutation. For a better picture for how this looks in the final artifact, look at the insert operation test.

With the information embedded in the artifacts, all that’s left is to teach the runtime how to handle the server’s response; this is broken down into two parts:

  1. When the cache encounters a request to subscribe to a field marked as a list, it saves a handler to that list in an internal Map under the provided name.
  2. When writing data, if the cache encounters a field with a list of operations embedded in the selection object, it inserts the result in the list using the handler it stored in step one.