Forest Kobayashi
JWB 119 | Email: fkobayashi@math.utah.edu | he/him
I am a postdoc at The University of Utah supervised by Anna Little. My interests are primarily in optimization problems in stochastic and statistical contexts (particularly Optimal Transport and Calculus of Variations), and applications thereof to machine learning and biological data analysis. I am interested both in pure theory for these problems as well as the design of high-performance algorithms for real-world use; in the latter context, I am especially interested in cases where the presence of extra structure helps avoid the curse of dimensionality.
I did my PhD studies at The University of British Columbia, where I was fortunate to be supervised by Young-Heon Kim. I defended in July 2025. My PhD thesis was concerned with approximating high-dimensional probability distributions via lower-dimensional structures. Before that, I did undergraduate at Harvey Mudd College, where my thesis was concerned with certain questions in Knot Theory. For more information, see my CV.
Here are links to my arXiv, Github, Gitlab, ORCID, and Google Scholar accounts. These can also be accessed by clicking the logos at the bottom of the page.
Preprints
O'Brien, L.; Kobayashi, F.; Kim, Y.-H. (2025). Structure of average distance minimizers in general dimensions. Posted to arXiv.
Less-technical abstract: The average distance problem (ADP) basically gives a model for how to define the “best” 1d shape (of finite total length) that “fills” a given region, where we are given a weighting that tells us that some areas of the region are more/less important to fill than some of the others. With this setup, we proved (in general dimensions and for most values of a certain parameter
Motivating example: Here you can think of the problem of trying to design the “most efficient” water pipe network for a city. Here, it’s instructive to think of two naïve solutions that both end up being inefficient. On one extreme, you could think about giving every client a direct water pipe that connects to the central pumping station and nowhere else. This manages to get every client water, but in most situations it ends up being super inefficient in terms of the amount of piping you have to use.
On the other extreme, you could try making your network one really long pipe that visits each client in sequence (e.g. a TSP tour). This is usually better than the first solution, but you can still end up doing some things inefficiently. For example, if you have three clients arranged at the vertices of an equilateral triangle of side length 1, and the pumping station is at the centroid of the triangle, then visiting the clients in sequential order ends up requiring
The optimal solution should probably instead be some sort of compromise between these two pictures: Maybe at a coarse scale you can “cluster” groups of points together (representing e.g. a single neighborhood in the city), send one big “artery” pipe out to each cluster, and then as you get close, you can split these “artery” pipes into smaller “capillary” pipes that branch out towards each individual client. In fact, the choice of terms “artery” and “capillary” here is evocative, since the ADP can also serve as a model for designing efficient vascular networks for oxygenating tissue!
What we did: Rather than considering the case of discrete “clients” (which would fall under the context of the Steiner problem; see e.g. Paolini’s papers on the Steiner problem), we consider the case where there are so many clients that it makes sense to model them via a continuous distribution (i.e. a probability measure). It turns out one can show that, like the Steiner problem, optimal solutions must be tree shapes.
Why we might care: The characterization of the topology of optimizers means we can create more-principled numerical approaches that explicitly account for branching etc., as will be detailed in a forthcoming work with Wenjun Zhao and Young-Heon Kim.
@article{O'Brien2025Mar,
author = {O'Brien, Lucas and Kobayashi, Forest and Kim, Young-Heon},
title = {{Structure of average distance minimizers in general dimensions}},
journal = {ArXiv e-prints},
year = {2025},
month = mar,
eprint = {2503.23256},
doi = {10.48550/arXiv.2503.23256}
}
Warren, A.; Afanassiev, A.; Kobayashi, F.; Kim, Y.-H.; Schiebinger, G. (2025). Principal curves in metric spaces and the space of probability measures. Posted on arXiv.
Less-technical abstract: The principal curve problem is to find the “most efficient” curve that “fits” a point cloud. Here, “efficient” means that the curve should strike a good balance between being close (on average) to all the data points while also remaining relatively simple (i.e. not too long). We consider this problem in the context of geodesic metric spaces and establish existence and consistency results. We also do some numerical experiments for the particular case of principal curves in Wasserstein space, which finds applications in the context of single-cell RNA sequencing.
Motivating example: Suppose you’re given a collection of data that you think was generated from a 1d curve (possibly with some noise). Assume that the data is given as a point cloud, with no information ordering the points along the generating curve. The question is, given enough data, can you reconstruct a good guess for what the generating curve was? (Essentially, this is like a nonlinear, unsupervised version of the least-squares problem.) The principal curve problem gives one possible formulation of this question.
As it turns out, for the most part, the fundamental structure in this problem only depends on the distance metric axioms and the existence of geodesics. So, we can work in the general context of a geodesic metric space. This is useful because one of our motivating questions comes from single cell RNA sequencing data.
The idea is something like the following: Suppose we start out with a collection of cells of various types (some stem cells, some specialized cells, etc.). As time progresses, maybe some of the stem cells specialize, or some of the specialized cells change further. We want to understand how the distribution of cell types in the sample evolves over time. In order to do this, we want to observe the distribution of RNA in the cells in our sample at various points in time. However, there is a problem: The measurement process is necessarily destructive. In order to find out what the distribution of RNA inside a given cell is, we have to lyse the cell, thus killing it and preventing us from taking a measurement of it in the future. What can we do instead?
One option is to try and get a ton of data from starting configurations that we think should be similar (possibly with variations in how far along they’ve evolved in time), let them all sit for a bit, and then measure each of them. For example, we could take tissue samples from a bunch of cloned mouse embryos, measure all the RNA present in the cells from each embryo, and try and plot the distribution for each one. Due to experimental constraints, it might not be feasible to determine a priori which embryos are farther along in the development cycle at the time of measurement, so this is something we’d want to be able to figure out on our own.
To model this mathematically, we can represent the RNA distributions from each of our embryo samples as probability measures. In this way, the data from each embryo (which might consist of a complicated discrete measure representing each of the RNA reads we get from that embryo) ends up looking like a single point in Wasserstein space. (See figure 1 on page 3 of our paper for an illustration.) Since Wasserstein space is a geodesic metric space, we can then apply the results of our paper to say things about consistency and numerical approaches for recovering the proper time ordering.
What we did: We analyzed the principal curves problem in the general context of geodesic metric spaces, so that we could develop results that apply not only to the “simple” case of data in
Why we might care: The principal curve methods we introduced give an alternative to previous methods for computing time-ordering (called the seriation problem); namely, spectral seriation methods. See our discussion in Appendix E for details.
@article{Warren2025May,
author = {Warren, Andrew and Afanassiev, Anton and Kobayashi, Forest and Kim, Young-Heon and Schiebinger, Geoffrey},
title = {{Principal Curves In Metric Spaces And The Space Of Probability Measures}},
journal = {ArXiv e-prints},
year = {2025},
month = may,
eprint = {2505.04168},
doi = {10.48550/arXiv.2505.04168}
}
Kobayashi, F.; Hayase, J.; Kim, Y.-H. (2024). Monge-Kantorovich Fitting With Sobolev Budgets. Posted on arXiv; manuscript currently in revisions for JMLR.
We quantify
We study the "gradient" of
Less-technical abstract: Essentially, we consider the following question: Given an
Anyways, it turns out this problem can serve as a model for certain generative machine learning problems. Very loosely, this makes some intuitive sense: If you think about feeding in a prompt to e.g. an image generation tool, you’re essentially giving a relatively “low-dimensional” input (a short string of text characters) and trying to get out a very “high-dimensional” output (an image). With this analogy, we are able to introduce a novel interpretation for the role regularization plays in improving training of certain generative networks. The idea is that it’s not just “avoidance of overfitting”—regularization can encourage the model to stick to parameter ranges that yield less extreme gradients, and thus better expressability within the model class.
Motivating example: Basically, imagine tying a balloon animal, or folding origami. In those cases, we start with plain 1-d/2-d objects (respectively, cylindrical balloons and sheets of paper), and want to “twist them up” in just the right way so as to approximate a familiar 3-d shape. Here, the material properties of the balloon/sheet of paper are important: Balloons can only be twisted so many times before popping, and paper famously can’t be folded too many times before becoming unworkable. These sorts of material limitations can be captured via restrictions on the smoothness of our approximating
There is a lot more to say on this paper, but it’s hard to capture all of the ideas in a summary box while remaining concise. We tried to make the introduction intuitive and thorough, so it might be a good idea to read that. Alternatively, see also the explanatory material in my PhD thesis, which contains this paper as one of the parts.
@article{Kobayashi2024Sep,
author = {Kobayashi, Forest and Hayase, Jonathan and Kim, Young-Heon},
title = {{Monge-Kantorovich Fitting With Sobolev Budgets}},
journal = {ArXiv e-prints},
year = {2024},
month = sep,
eprint = {2409.16541},
doi = {10.48550/arXiv.2409.16541}
}
Kobayashi, F. (2020). Performing Countably Many Reidemeister Moves. Posted on arXiv with a different title. I forgot to submit it while I was in grad school and am now no longer too dialed-in to the contemporary happenings in Knot Theory, so I am not sure the contents of the document are still a topic of interest. If you might have an insight here, please get in touch!.
Q: “Why hasn’t this been submitted to a journal yet?”
This project something I wrote based on my undergrad thesis. I have been meaning to go back and get it “submission-ready” for a while now, but that fell by the wayside when I started getting my PhD research up and running (I’m in a totally different field now).
I’m also slightly paranoid that the results are known already (though I was more or less unable to find a reference when I did my literature search back in 2020). It seems like something that should be folklore. My undergraduate advisors didn’t know of existing work in this direction, but then again, neither of them had worked with wild knots before.
What’s in the paper?
The executive summary is that the paper addresses certain cases where you can vs. can’t perform countably-many Reidemeister moves.
As motivation, I present two different curves whose diagrams both look like they “should” be unknots. However, only one of them is (this can be shown using a sort of “invariant” for tame knots).
This is strange, because it really looks like you should be able to unknot them both using Reidemeister moves. The problem is subtle, but maybe obvious once you see it. Essentially, for the wild knot, when you perform all the Reidemeister moves you end up “dragging” points in the ambient space along in such a way that as you undo the crossings you pull more and more points of the ambient space down toward a wild point. In the limit, you end up losing bijectivity of your ambient isotopy because at least a countable sequence of points from the ambient space get mapped to the wild point.
In the end, I state a conjecture about how for a certain class of diagrams, adding a fourth Reidemeister move that was suggested by Kye Shi might recover “diagrams equivalent by moves
It would be cool to see further work on this, since as far as I can tell the theory of wild knots has gone largely untouched for the past half-century-or-more. It would also be nice to know if there are some obscure references for this sort of thing that I didn’t find during my literature review.
@article{Kobayashi2021Jan,
author = {Kobayashi, Forest},
title = {{Uniform Convergence and Knot Equivalence}},
journal = {ArXiv e-prints},
year = {2021},
month = jan,
eprint = {2101.04106},
doi = {10.48550/arXiv.2101.04106}
}
Publications
Kobayashi, F.; Nelson, S. (2020). Kaestner brackets. Topology and its Applications.
@article{Kobayashi2020Aug,
author = {Kobayashi, Forest and Nelson, Sam},
title = {{Kaestner brackets}},
journal = {Topology and its Applications},
volume = {282},
pages = {107324},
year = {2020},
month = aug,
issn = {0166-8641},
publisher = {North-Holland},
doi = {10.1016/j.topol.2020.107324},
url = {https://www.sciencedirect.com/science/article/pii/S0166864120302674},
keywords = {Biquandle brackets, Quantum enhancements, Skein invariants, Parity biquandles, Virtual knots and links}
}
PhD Thesis
Kobayashi, F. (2025). Optimal transport approximation of measures via lower-dimensional structures : principal manifolds and the average distance problem. PhD Thesis.
In both problems we quantify the performance of
The two problems differ in the way we quantify the ``complexity'' of the approximating sets. In the first problem (Part I) we consider a class of constraints defined in terms of parametrizations
One of the key concepts we develop along the way is the ``gradient'' of
In the second problem (Part II) we restrict ourselves to
Lay summary (page iii; equiv. 4 of 252): The ideas in this thesis are, roughly, a mathematical version of balloon modeling. In tying a balloon model, one begins with a rather uniform, featureless object: a cylindrical balloon. But, by twisting it in just the right ways, we can represent complicated 3-d objects: a swan, a dog, a flower, or a sword! Naturally, there is a limit to the balloon’s expressiveness: Add one too many fancy twists and it could pop. Judiciousness is essential; one must select only the most important features to represent (e.g. the ears/tail of the dog, or the petals of the flower). Origami works similarly, but using a 2-d object—a sheet of paper—rather than a 1-d balloon.
Analogously, this thesis considers a family of problems in approximating complicated n-d shapes by twisting up contiguous m-d ones. The goal is to represent the object’s essential features while remaining as simple as possible
Additional content over constituent papers: This is described in the preface on page iv (5 of 252). The main highlight is the much more thorough treatment of the numerical routines for the Sobolev principal manifolds. However, there are also some important additions over the “Monge-Kantorovich Fitting With Sobolev Budgets” paper; namely, more explanatory text and diagrams, a more thorough explanation of the differences between the soft penalty and hard constraint formulations (page 24-28 == 48 of 252 to 52 of 252), expanded discussion of nonuniqueness of optimizers (Counterexample 3.2.5), a detailed explanation of a certain pathological ridge set (Example 3.3.1), some discussion of the second variation in
@phdthesis{Kobayashi_2025,
series={Electronic Theses and Dissertations (ETDs) 2008+},
title={Optimal transport approximation of measures via lower-dimensional structures : principal manifolds and the average distance problem},
url={https://open.library.ubc.ca/collections/ubctheses/24/items/1.0449621},
DOI={10.14288/1.0449621},
school={University of British Columbia},
author={Kobayashi, Forest},
year={2025},
collection={Electronic Theses and Dissertations (ETDs) 2008+}
}