So if I'm tracking the progress correctly, now we should be able to do: Single Image -> Gaussian Splats -> Object Identification -> [Nearest Known Object | Algo-based shell] Mesh Generation -> Use-Case-Based Retopology -> Style-Trained Mesh Transformation
Which would produce a new mesh in the style of your other meshes, based on a single photograph of a real-world object.
...and, at this speed, you could do that as a real-time(ish) import into a running application/game.
Gotta say, I'm looking forward to someone putting these puzzle pieces together! But it really does feel like if we wait another month, there might be some new AI that shrinks that pipeline by another one or two steps! It's an exhausting time to be excited!
Probably a dumb question, but is this trained by the use of lots of inputs of similar objects, or is it 'just' estimating by the look of the input image?
Like, if you have an image of a car, viewed at an angle, you can gauge the shape of the 3d object from the image itself. You could then assume that the hidden side of the car is similar to the side that you can see, and when you generate a 360 rotation animation of it, it will look pretty good (cars being roughly symmetrical). But if you gave it a flat image of a playing card, just showing the face up side, how would it reconstruct the reverse side? Would it infer it based on the front, or would it 'know' from training data that playing cards have a very different patterned back to them?
Since it's based on 3D Gaussians in space, is there a way to obtain sharp images? Inherently, Gaussian functions extent infinitely, so images always look blurry. Don't they? Of course, \sigma can be optimized to be small, but then it converges to some point representation, doesn't it?
Maybe some CV/ML people can help me understanding.
I guess this is how you'd implement that thing in Enemy Of The State where they pan around a single-perspective camera view (which I think doesn't come across as absurd in the movie anyway since the tech guys point out it's basically a clever extrapolation).
Now we can finally turn Street View into a game world!
For anybody wanting to take a look at the code, this time the Github link does include it - it's not empty, which is typicaly for those "too good to be true" publications
Am I imagining this ,or somebody making a newer and faster one of these every day?
I'm expecting Overwhelming Fast Splatter by January.
For a change, [code] works, but [arXiv] link is not present. Have to say this looks really interesting!
the paper link doesn't work for me. the correct link https://arxiv.org/pdf/2312.13150.pdf
Wouldn't it be more useful to generate a vector model than a "3d image" voxel/radiance field/splats/whatever it's called? Apart from the use case "I want to spin the thing or walk around in it" they feel like they are of limited use?
Unlike say a crude model of a fire hydrant which you could throw into a game or whatever. If the model is fed some more constraints/assumptions? I think I saw some recent paper that did generate meshes now instead of pixels.
All I have to say is "ENHANCE!"
This would be more powerful if you could feed it more input images for a better result, if desired.
This could get prove useful for autonomous navigation systems as well.
That "GT" method seems even better, we should just use that. /s
Oof, the dependency tree on this.
It uses diff-gaussian-rasterization from the original gaussian splatting implementation (which, is a linked submodule on the git, so if you are trying to git clone that dependency remember to use --recursive to actually download it).
But that is written in mostly pure CUDA.
That part is just used to display the resulting gaussian splatt'd model, and there have been other cross-platform implementations to render splats – there was even that web demo a few weeks ago, that was using WebGL [0] – and if that was used as a display output in place of the original implementation there is no reason people couldn't use this on non-Nvidia hardware, I think.
edit: also device=cuda is hardcoded in the torch portions of the training code (sigh!). This doesn't have to be the case. pytorch could push this onto mps (metal) probably just fine.
[0] https://github.com/antimatter15/splat?tab=readme-ov-file