An experimental framework for generating CUDA kernels using graphical representations of algorithms is presented. Kernels and threads are identified using the disconnectivity and isomorphic nature of these graphs. It is shown that kernels generated using this method exhibit pre-alignment of input memory. For square matrix multiplication, a GPU kernel speedup of up to 2.8× compared to a handwritten naive implementation was observed.