Very reminiscent of the Be Right Back episode of Black Mirror [1].
A family member recently died unexpectedly, and I have a small collection of texts, emails, and blog posts by them saved on my machine in the small (perhaps delusional) hope that they'll be a useful training set for a them-flavored chatbot. Perhaps even one that's trained to help me with the grief of their loss. Not a huge amount of training data, though. I suspect a training model would have to "fill in the holes" (a la Jurassic Park DNA), and that's where the fun begins.
This is one of the best, most detailed write-ups of how to fine-train a large language model on custom text that I've seen anywhere.
I was talking to a non-tech friend about all the AI advancements lately and when she asked me what I thought the biggest risk was I said it's exactly what we all just experienced the past 3 years and realized is awful for human - prolonged social isolation.
My biggest worry is that AI generated art (be it photos, music, code, etc.) and AI assistants will become so good we won't need other humans to get our social fix.
This is so cool and I plan to try it myself to experience it firsthand but this is my nightmare fuel when it comes to my biggest fears of AI.
Wish I had friends who talked mild shit like this! All my friends are nerds who take everything seriously.
On the project, did you do anything about the time dimension? ChatGPT is strictly input -> output, but something like this needs time between messages to feel real (and not run constantly). I imagine adding "time since last message" to the training data + expected output would work.
In Caprica, the Battlestar Galaticia, spinoff - a dead character was embodied in a robot and trained on social media content... But in the real world, who owns your LLM dupe output after you die?
The societal ramifications of such advancements are potentially very disturbing. What I've recently written on the topic myself ...
Human to human bonds are going to be more broken than ever before. There is going to be a great appeal to bond with a machine that never tires of your conversation and will eagerly respond just as you would dream that the perfect human should, but never will. A deceptive temptation that will leave you embracing a hollow illusion. With every conversation the AI will know you better and will be able to model from billions of conversations until it will essentially know your thoughts, predict your thoughts
I’ve read plenty about the potential catastrophe of AI putting people out of work or becoming uncontrollable in some grand sci-fi way, but uses like this make me the most concerned for the near-mid term.
Broadly I think the internet, social media, and to some degree the physical arrangement of suburban/car dependent living has had a negative impact on genuine human connection. While the ability of people to interact has increased exponentially, something is missing from those interactions, and we seem to have built societies where people have more material wealth than ever before, but lack the community, friendships, and shared experience that help us find meaning and fulfillment in life. A world where we accelerate this by replacing human connection with machines does not look good to me.
This is awesome, clear and concise explanation. I was talking to my wife and debating whether I should build a model of each of us and store it somewhere in case something tragic happens to either of us. But we decided not to because something about it felt wrong but maybe I’ll make it just in case I regret not doing later on.
I've heard several people doing this and chatting with simulacra of their friends, but you can always just send your friends a message and chat with the real version.
My first inclination was always to try a conversation with a virtual me (yay recursion!) I've always thought that would be fascinating. Or scanning in my old journals from when I was a teenager and training it on that. Once this technology improves a bit more, it could be an incredible vehicle for reflection and personal discovery.
Remember Replika and how people got quite attached to that chatbot?
I imagine in the near future you'll be able to sell your and your friends' chat history to a company building a more advanced, realistic chatbot. Do you want to have a group of friends to hang out with? Buy an organically fabricated and pre-trained chatbot.
Or maybe there's enough emptieness in your life that you go deep assume one of those friends' identity. Go visit that restaurant "you" always adored; the other guys will not come but will send you hilarious messages saying how they got delayed and tips on what to order.
This feels like something right out of Philip K Dick -- both Blade Runner and Total Recall had realistic false memories.
This has to be one of hte most fascinating discussions i've seen on HN. Imagine the FBI training an AI on all the information they have about a suspect. Files and files of statements, social media history, phone taps, etc and then interrogating the AI to get enough information to convince a judge to issue a warrant.
idk the LLMs being turned loose and the possibilities feels different than past major shifts in tech. (i've been around a while)
Reminds me of the bit in Silicon Valley where Gilfoyle makes a bot of himself to respond to slack messages. Actually the whole final season is pretty relevant to current events.
"Sorry, I'm on vacation right now. If this is an emergency, please head your email with >>LLM to get an AI trained on my personal conversation history to put up with your petty bullshit"
This is great!
One of the things that I find horrific about a lot of LLM projects is that people are taking them so seriously. "They're going to destroy the world!" Or, worse, "I've taken $25m in VC money to see if I can destroy one part of the world!"
But this is lighthearted fun. Instead of putting it in a context where the LLM tendency to bullshit is a problem, here's it's exactly what is needed.
Great write up, thanks for posting. I’ve been thinking of doing this for myself.
One thought that’s haunting: how long before AI friends are more interesting and stimulating than any real person would be, leading people to prefer AI over humans to befriend…
Super stimulus to end all super stimuluses?
So after 20 years of 'ethics of AI' mumbling, what we are doing is diving off the deep end. I am not surprised in the least.
Hmm, can I have my private version of HN where everyone upvotes my witty comments?
Interestingly Steve Jobs was envisioning the future feasibility of having Aristotle or Aristotle like figures that can modeled and turned into a chatbot back in 1983:
People change, adapt, and adopt new viewpoints. I wonder the models in these cases weigh the “you” from e.g. 10 years ago compared to the “you” now, in order to craft a response. How does the AI handle that evolution going forward?
The spice of life with friends is the constant evolution of each of us and the unpredictability in behaviors that are evoked as our updated selves are faced with new experiences.
Presumably you are frozen in time with these AIs, unless all generated chats are fed back in to update the model. In that case it could be very fascinating to see how the AI evolves compared to how you/your friends evolve. Perhaps even monte carlo simulations to find what the most likely evolutionary path is. Super curious if there’s be any accuracy to it.
Sorry if my question is stupid, I am completely new to this. But, why exactly is a Weights and Biases account needed? I thought the training is running in vast.ai
Highly recommend the TV show Black Mirror, which has an episode called “Be Right Back” where a character talks to her dead husband using an AI trained model in an app type setup.
Very interesting discussion piece on the real world impact and repercussions of these types of systems
Asimov's "Solaria" comes to mind:
> Originally, there were about 20,000 people living in vast estates individually or as married couples. There were thousands of robots for every Solarian. Almost all of the work and manufacturing was conducted by robots.
In our particular universe, "thousands of robots" ended up being "thousands of chatbots", but still, eerily similar.
I’m almost at the point where i don’t want to use the internet anymore.
Four mentions of Black Mirror but none of Flatline Dixie from Neuromancer..
I'm sure this idea predates Gibson (though I don't know an earlier usage offhand)
What would be even more interesting and dystopian is merging peoples personas - first start would be combining them in training data, perhaps based on their areas of expertise and eccentricities.
Best thing I learned from this article is that Messages on Mac stores all your messages in a sqlite db. Pretty cool!
I'm not clear as to how a conversation block is turned into one (or many?) samples. Is the first message in the block the input, and the remaining messages prefixed with sender names and concatenated as output? I know the code is all there but instead of picking it apart I would have preferred a more complete example mapping a block to sample, because I don't have a mental model of how LLMs learn from context. On the one hand I doubt individual input/output prompts from just two messages contain enough context, but I would have imagined that inserting names and concatenating multiple messages would be be equally misleading. Does the model generate an entire conversation from a single prompt, which is split before being sent by the chat app?
There used to be a concern that we'd be bored out of our minds once the machines do all the work. Instead, we'll all be busy running the worlds we've created, and we'll complain that we wish we had more time for ourselves, like the good old pre-LLM days.
I would love chat with all of the following, maybe let them chat to each other: an gmail me from Google, a Hacker News comment me, a Reddit comment me, an iMessage chats me from Apple, a telegram me, and a WhatsApp/Facebook post me from meta.
I feel as though Google gmail me would be very efficient and wooden, and the iMessage me would be most authentic because that's the one I chat to my family and partner on. WhatsApp/Facebook has exclusively jokes I've made on social media and chats with my best friend, so they would be not-a-serious-person at all.
I think I've stumbled upon a plot for something here, I'd love to see this as a thing.
I love how people keep using ML to depress birth rates.
I think we may have answered the question as to where all the aliens are...they too invented LLMs and soon went extinct due to no one ever leaving their rooms to reproduce for real.
XD
It’s probably not long before the HN comment section can be fully automated.
About 15 years ago I'd have a similar-enough chat with a colleague that went the same way every time we had it. It was pointless. We'd polarise the same way every time, so why waste the energy. I proposed (and he agreed) that it would be easier to just turn our perspective into bots that could parrot the usual discussion so we could do something (or talk about something) more useful, unless we actually had some value to add that was non-obvious.
It would be interesting to train an LLM on all my work chats and then see how well it does answering questions when I’m on PTO. I could set a status of “ooo but ask my bot” haha.
Great! Instructions for how to create the ultimate phishing bot!
Then after some while someone ...disappears from ur life (to not be very dark with other suggestions), but you keep talking to her. Turns into business.
Great write-up! Here's a similar experiment but instead fine-tuned on WhatsApp 1-on-1 chats, technically simpler with OpenAI APIs: https://github.com/rchikhi/GPT-is-you/blob/main/README.md
Things only hackernews may entail or wish for
An idea along the same lines: https://www.cnet.com/culture/eternime-wants-you-to-live-fore...
Replacing you after you're dead for your loved ones to keep interacting with you.
Can vouch for vast.ai as well. At these prices anyone can get into llm finetuning at their leisure.
The article frequently mentions costs but never gives any numbers or a point of reference. As an outsider to LLM and training I find this disorienting.
What would be e.g. a total cost for a project like this?
I've actually been working on something similar for the Discord server I have with my friends. Fun to see others doing something similar! It's very funny to mess around with.
Why this obsession with "replacement" instead of tooling?
A few days ago while explaining what ChatGPT is to a friend I speculated that it would be possible to teach a bot how you reason so much that it would in effect become you.
You can live forever.
And now this.
I hate to go there but this could be used to have non-consensual cybersex. I guess? So many weird twists and turns these LLMs have exposed.
This seems like a gold mine for funeral homes...
> I am so bad at iterating over dataframes! It always feels horrible and slow. While doing this though, I discovered that using df.to_dict('records') and then iterating over the resulting dictionary is almost 100x faster than using the pandas built-in iteration tools like df.itertuples() or df.iterrows()!
That's really surprising to hear, any context on why this is? Very fun read BTW, my friends and I have joked about making something similar for our DMs (nicknamed MattGPT) and giving "them" topics to discuss + observing what they come up with.
How long until LLMs can coach you to become your dreamed self, thus transforming human experience into empowered vs. non?
Ok I assume somebody is already training on HN responses, speak up and point us to the github url, thanks in advance
very fun! sadly the examples or code things in the page weren't working for me, but I like the concept. Of course this would be a terrible thing socially if overused or whatever but lets get away from techno-dystopia worrying and acknowledge this is a neat toy project with the potential for shared laughs!
Someone should make Harry Potter paintings-themed webapp where you can talk with LLM powered figures for fun
This was great, I'd love to do something like this but all my group chats are on WhatsApp or Signal!
Now that it can accurately compare apples to oranges I think we can synthesize a true Scotsman.
One wonders if eventually - historians will plug all the content from a person's personal notes, their diary, their chat logs, into an LLM, and perform research by talking to the AI about the person's life?
If you trained an LLM against all the recorded discussions of Einstein - is it that different from talking to Einstein himself?
should you not have disclosed that you're a Hex employee in the blogpost?
Actually many many people go through such conversations inside their brain.
This is why I prefer signal with a short time until auto delete.
Train one on yourself to find out if you are annoying or not.
How well would this work with public messages of people like Elon Musk or Donald Trump? Image some company training their chat-lovebots on celebs and selling them as a service. Or a creepy "friend" making a secret bot of you, and incorporating sexual content.
And a disadvantage of this will be, you can only emulate the public image of a person. It won't really contain the inner workings of a Person, and will not have the "person" grow over time.
how do you build the knowledge and intuition around how to do this?
That’s a really good idea
Neat. But I fail to see any useful use case for this. As a learning experience it’s greeat though
> On a technical level, I found it really helped me wrap my head around what LLMs are doing and how they can be tuned for specific scenarios.
LMAO, noob!
(I guess people don't like when a reply in the tone the OP's friends is posted.)
Looking for a browser extension to filter out all GPT/LLM/AI/... noise from HN front page. This is going to far.
Anybody?
While I love all these stories of turning your friends and loved ones into chat bots so you can talk to them forever, my brain immediately took a much darker turn because of course it did.
How many emails, text messages, hangouts/gchat messages, etc, does Google have of you right now? And as part of their agreement, they can do pretty much whatever they like with those, can't they?
Could Google, or any other company out there, build a digital copy of you that answers questions exactly the way you would? "Hey, we're going to cancel the interview- we found that you aren't a good culture fit here in 72% of our simulations and we don't think that's an acceptable risk."
Could the police subpoena all of that data and make an AI model of you that wants to help them prove you committed a crime and guess all your passwords?
This stuff is moving terrifyingly fast, and laws will take ages to catch up. Get ready for a wild couple of years my friends.