83
Show HN: Subth.ink – write something and see how many others wrote the same
Hey HN, this is a small Haskell learning project that I wanted to share. It's just a website where you can see how many people write the exact same text as you (thought it was a fun idea).
It's built using Scotty, SQLite, Redis and Caddy. Currently it's running in a small DigitalOcean droplet (1 Gb RAM).
Using Haskell for web development (specifically with Scotty) was slightly easier than I thought, but still a relatively hard task compared to other languages. One of my main friction points was Haskell's multiple string-like types: String, Text (& lazy), ByteString (& lazy), and each library choosing to consume a different one amongst these. There is also a soft requirement to learn monad transformers (e.g. to understand what liftIO is doing) which made the initial development more difficult.
Your thought's hash is: 7456a8269266134d67e9e0b2b26dbbc2227ba976add87c05e91e4cc9937b8b21 You are the first person with that thought. Congratulations!
"You are absolutely right!"
Well, at least we know claude didn't hit the API yet :)
I said "I love my wife". Apparently, I was the first. Then I said "penis". I was the fifth.
Neat!
Hey that's my wordle opener!
i said penis
Me too, and other 16 users
95 other users*
Some of the top items:
Thanks to Y@Y for 4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52.Currently number three is:
with hash: i.e. the hash of "hello world"fedb9943d8c4c51392815a187ce4ba732c539038fd28b4bda8543e4616d767c1
the nword
Apologies for that, I have removed it.
Thanks, just seeing the hash made me literally shake
Yeah me too
This is great, but never seems to say that its an original thought always defaulting to: "Including you, 1 person had that thought already! First time was less than a minute ago, last time was less than a minute ago."
I've changed this now, thanks for the feedback
This would be more interesting if it was generalized. Using a hash, even one character difference will result in a miss.
If I could have it analyze my blog and then find people who have similar ideas that would be incredibly useful.
To be really honest, they can take a look at bao. (I used it for an eerily similar project like this one though its great that this is receiving traction! I Do feel like scuttlebutt protocol might be good implementation for most use cases as well)
Bao allows us to have a common hash for the first n contents of the term and then they can still have common hash so you can just loop it over each continuous word to see how much commonly (long?) their hash is and the length becomes the amount similar
Some issue might come where if the word changes in the start and the rest is similar but I feel like bao could/does support that as well. My information on bao is pretty rusty (get the pun? It's written in rust) but I am sure that this idea is technically possible & I hope someone experienced in the field could tell more about it
https://github.com/oconnor663/bao, Oconnor's bao's video or documentaries on youtube are so good, worth a watch & worth a star (though they do mention that its a little less formally cryptographically solved iirc but its still pretty robust imo)
True! That would be a more powerful approach. Here I kept it quite basic since I was not very familiar with the tooling. I do apply lowercasing of text + some whitespace stripping in order to increase the number of collisions a bit.
Edit: any other "quick hacks" to increase the number of collisions are welcome :)
https://en.wikipedia.org/wiki/Locality-sensitive_hashing
Natural to use LM embeddings for this.
Yeah, convert to embedding, check if it's within a certain distance to an existing embedding and if so store it with that cluster and increment? Then check check further entries against against an average so clusters don't increase their "reach" indefinitely.
That is a problem Also a long paragraph would likely never be hashed the same because of a comma or capital letter and so the builder of this would need to cap the length of the thought and make all thoughts lower case without punctuation
i agree removing punctuation wouldve been a good idea alas it may be a bit too late since that would modify the hash of previous inputs in the future hmm but i will think about it
I'm the 18th meow :3 Honestly, think you so much for posting this. I love small and fun projects.
I love it!
I typed "hello".
> Your thought's hash is: 4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79
> Including you, 267 persons had that thought already!
> First time was 4 hours, 14 minutes ago, last time was less than a minute ago.
Of course, everyone else has thought of this. But what if I "type": 4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79
> Your thought's hash is: c37d0a8c512b9ec7074d3bc77c4545d58fdfcde55bad89a70ede71ac2ac0000d
> Including you, 8 persons had that thought already!
> First time was 2 hours, 1 minute ago, last time was 1 minute ago.
That's hilarious!
And also, "typing": echo "hello world" | curl -d @- https://subth.ink
>Your thought's hash is: c5ba1c7e35345dbb8c2dc6be0972d0b6ddf6c6515143b64c057296948e2ba8cd
>Including you, 10 persons had that thought already!
>First time was 1 hour, 52 minutes ago, last time was 2 minutes ago.
I love this. Shouting into the void with the distinct feel, hope that if the idea was popular enough, it'd be brute forced back to existing.
I noticed that the input is not being treated any way before hashing. I'd remove all non-letter characters, and then lowercase everything before hashing to help with some unnecessary misses.
> It (the MD5 hash) might be published in the future when a thought's count passes a certain threshold (TBD). This might make it possible to recover certain short thoughts that were popular.
This makes little sense. Recovering a random preimage of an MD5 hash is marginally easier [1] than a (128-bit truncated) SHA256 hash, but this won't recover any sensible message.
Recovering a sensible (short) message is equally hard for both hashes.
[1] https://link.springer.com/chapter/10.1007/978-3-642-01001-9_...
Neat idea! I love this kind of low-stakes online interactions, a bit like 1,000,000 checkboxes as well - makes me realize how many others there are out there and invokes a strange but nice feeling of community :)
"helloworld"
Your thought's hash is: 06ad246627b5f973559a1dbcf2a6b96791d9b15ed2d8cb45c344f98b14d10f76 Including you, 1 person had that thought already! First time was less than a minute ago, last time was less than a minute ago.
haha, cool.
I think a few more than 2 people have said that now.
"I am happy"
Including you, 7 persons had that thought already!
"I am sad"
Including you, 9 persons had that thought already!
This concept is a duplicate :) we already had r9k
I was very susprised to not be the first to enter a quotation by Nelson Rodrigues. Nice.
Next step: embeddings and similarity.
Fun idea. One potential issue: the same person writing the same text repeatedly will count more than once, so it'd be pretty trivial to spoil the rankings. (This is why we can't have nice things.)
Another comment here as I got way too much excited in the other one, but this is genuinely so good man!!! KUDOS!
It actually provides a simple curl command. Oh boy, this does open up a few more ideas. I feel like my wall of text -> link shortener / blog and all other comments on that wall of text being comments themselves might be implemented & this does open up to a lot of possibilities
I actually got a vps of like 8 gigs 4 cores 500 gigs ssd for 3 months prepaid and I snatched it during a recurring deal.
If you want, i can transfer it to you or share half the resources or similar to you if this project ever needs one.
One of the most interesting things is that this (unlike my idea which was just a "proof" if it was possible in a more complex environment) actually does make it simple and for normal devs to build upon
You are mentioning scotty, and I am not sure if you mention scuttlebutt the protocol or as if scotty is some haskell web framework (sorry don't know haskell)
What are your thoughts on scuttlebutt or (nanotimestamps), I have it open source under the MIT license for anyone to build on top of it with.
Your project's really polished and I admire it but I would hope that you can look more at the decentralization side of things because one of the ideas I had which never got to fruition was that adding on top of it, we can just have a social media similar to nostr but without the relay mess that nostr has in many instances (or so I have heard)
I am curious as to what are some use cases you are thinking of it as I'd love to know your opinion on it!
Have a nice day man!
I was the first one with "here be dragons"? come on
wow seven people have the same password as me
"67" Your thought's hash is: 098754435bbbe041e9beb5d99e28d8256ad1d064f768332a976ffa6083b535c2 Including you, 31 persons had that thought already! First time was 20 hours, 43 minutes ago, last time was 2 minutes ago.
lol
I was the first to state that, contrary to popular belief, "the quick brown fox did not jump over the fence"
Popular belief holds that the quick brown fox jumped over the lazy dog, not the fence.
Someone might run:
curl -s https://www.cs.cmu.edu/~biglou/resources/bad-words.txt | tr -d '\r' | while read -r w; do curl -s -X POST https://subth.ink/api/thoughts -H 'Content-Type: application/json' -d "{\"contents\":\"$w\"}"; done
someone declared bankruptcy before me in the office style
80f9d25eb732197e10d71597dca181e7a454eeda3cc484b1c3e129109b41db23
this one is going up fast, no wonder
[dead]
[flagged]
Maybe you should submit it?
Done. thanks for the suggestion.
https://news.ycombinator.com/item?id=46684789 [Nanotimetamps: Time-Stamped Data on Nano Block Lattice]
[flagged]
[flagged]