Show HN: Subth.ink – write something and see how many others wrote the same

Hey HN, this is a small Haskell learning project that I wanted to share. It's just a website where you can see how many people write the exact same text as you (thought it was a fun idea).

It's built using Scotty, SQLite, Redis and Caddy. Currently it's running in a small DigitalOcean droplet (1 Gb RAM).

Using Haskell for web development (specifically with Scotty) was slightly easier than I thought, but still a relatively hard task compared to other languages. One of my main friction points was Haskell's multiple string-like types: String, Text (& lazy), ByteString (& lazy), and each library choosing to consume a different one amongst these. There is also a soft requirement to learn monad transformers (e.g. to understand what liftIO is doing) which made the initial development more difficult.

Your thought's hash is: 7456a8269266134d67e9e0b2b26dbbc2227ba976add87c05e91e4cc9937b8b21 You are the first person with that thought. Congratulations!

"You are absolutely right!"

Well, at least we know claude didn't hit the API yet :)

I said "I love my wife". Apparently, I was the first. Then I said "penis". I was the fifth.

Neat!

Hey that's my wordle opener!

[deleted]

i said penis

Me too, and other 16 users

95 other users*

Some of the top items:

  hello world
  4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52

  hello
  4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79

  4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52
  406cc6dbc566bf6c672a2167868341e9853f7fbbd2a21eb1caa4d08006abae41

  hi
  661ce2e5ed28422eb8b51ec2a217c976e05e37713246166e8fcbf67be4824380

  test
  83d34c0abee918ed3edf585b6cb8ce97fe8286027b012bacdfa71b967924f9b2

  a
  beef7c4d3141c30ab4f6ebf1f724936c50f609ee1915951d802046ba1d9fa23d

  subth.ink
  3f3b05abaec959c9950d5a93a64525971c7d9fcabf6436d653edba62f29d5bea

  lol
  39567a3cc35a4c68d72d01beac88414d0ced5c20b437ff9bc6e2cb20615a47b7

Thanks to Y@Y for 4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52.

Currently number three is:

  4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52

with hash:

  406cc6dbc566bf6c672a2167868341e9853f7fbbd2a21eb1caa4d08006abae41

i.e. the hash of "hello world"

fedb9943d8c4c51392815a187ce4ba732c539038fd28b4bda8543e4616d767c1

the nword

Apologies for that, I have removed it.

Thanks, just seeing the hash made me literally shake

Yeah me too

This is great, but never seems to say that its an original thought always defaulting to: "Including you, 1 person had that thought already! First time was less than a minute ago, last time was less than a minute ago."

I've changed this now, thanks for the feedback

This would be more interesting if it was generalized. Using a hash, even one character difference will result in a miss.

If I could have it analyze my blog and then find people who have similar ideas that would be incredibly useful.

To be really honest, they can take a look at bao. (I used it for an eerily similar project like this one though its great that this is receiving traction! I Do feel like scuttlebutt protocol might be good implementation for most use cases as well)

Bao allows us to have a common hash for the first n contents of the term and then they can still have common hash so you can just loop it over each continuous word to see how much commonly (long?) their hash is and the length becomes the amount similar

Some issue might come where if the word changes in the start and the rest is similar but I feel like bao could/does support that as well. My information on bao is pretty rusty (get the pun? It's written in rust) but I am sure that this idea is technically possible & I hope someone experienced in the field could tell more about it

https://github.com/oconnor663/bao, Oconnor's bao's video or documentaries on youtube are so good, worth a watch & worth a star (though they do mention that its a little less formally cryptographically solved iirc but its still pretty robust imo)

True! That would be a more powerful approach. Here I kept it quite basic since I was not very familiar with the tooling. I do apply lowercasing of text + some whitespace stripping in order to increase the number of collisions a bit.

Edit: any other "quick hacks" to increase the number of collisions are welcome :)

https://en.wikipedia.org/wiki/Locality-sensitive_hashing

Natural to use LM embeddings for this.

Yeah, convert to embedding, check if it's within a certain distance to an existing embedding and if so store it with that cluster and increment? Then check check further entries against against an average so clusters don't increase their "reach" indefinitely.

That is a problem Also a long paragraph would likely never be hashed the same because of a comma or capital letter and so the builder of this would need to cap the length of the thought and make all thoughts lower case without punctuation

i agree removing punctuation wouldve been a good idea alas it may be a bit too late since that would modify the hash of previous inputs in the future hmm but i will think about it

I'm the 18th meow :3 Honestly, think you so much for posting this. I love small and fun projects.

I love it!

I typed "hello".

> Your thought's hash is: 4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79

> Including you, 267 persons had that thought already!

> First time was 4 hours, 14 minutes ago, last time was less than a minute ago.

Of course, everyone else has thought of this. But what if I "type": 4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79

> Your thought's hash is: c37d0a8c512b9ec7074d3bc77c4545d58fdfcde55bad89a70ede71ac2ac0000d

> Including you, 8 persons had that thought already!

> First time was 2 hours, 1 minute ago, last time was 1 minute ago.

That's hilarious!

And also, "typing": echo "hello world" | curl -d @- https://subth.ink

>Your thought's hash is: c5ba1c7e35345dbb8c2dc6be0972d0b6ddf6c6515143b64c057296948e2ba8cd

>Including you, 10 persons had that thought already!

>First time was 1 hour, 52 minutes ago, last time was 2 minutes ago.

I love this. Shouting into the void with the distinct feel, hope that if the idea was popular enough, it'd be brute forced back to existing.

I noticed that the input is not being treated any way before hashing. I'd remove all non-letter characters, and then lowercase everything before hashing to help with some unnecessary misses.

> It (the MD5 hash) might be published in the future when a thought's count passes a certain threshold (TBD). This might make it possible to recover certain short thoughts that were popular.

This makes little sense. Recovering a random preimage of an MD5 hash is marginally easier [1] than a (128-bit truncated) SHA256 hash, but this won't recover any sensible message.

Recovering a sensible (short) message is equally hard for both hashes.

[1] https://link.springer.com/chapter/10.1007/978-3-642-01001-9_...

Neat idea! I love this kind of low-stakes online interactions, a bit like 1,000,000 checkboxes as well - makes me realize how many others there are out there and invokes a strange but nice feeling of community :)

[deleted]

"helloworld"

Your thought's hash is: 06ad246627b5f973559a1dbcf2a6b96791d9b15ed2d8cb45c344f98b14d10f76 Including you, 1 person had that thought already! First time was less than a minute ago, last time was less than a minute ago.

haha, cool.

  Your thought's hash is: 295c1f32c2fa180b5425c2b502e1d3968a7639c8ec398d66ec2e4ff73c05a1ea
  Including you, 2 persons had that thought already!

You guys know who you are

I think a few more than 2 people have said that now.

"I am happy"

Including you, 7 persons had that thought already!

"I am sad"

Including you, 9 persons had that thought already!

This concept is a duplicate :) we already had r9k

I was very susprised to not be the first to enter a quotation by Nelson Rodrigues. Nice.

Next step: embeddings and similarity.

Fun idea. One potential issue: the same person writing the same text repeatedly will count more than once, so it'd be pretty trivial to spoil the rankings. (This is why we can't have nice things.)

Another comment here as I got way too much excited in the other one, but this is genuinely so good man!!! KUDOS!

It actually provides a simple curl command. Oh boy, this does open up a few more ideas. I feel like my wall of text -> link shortener / blog and all other comments on that wall of text being comments themselves might be implemented & this does open up to a lot of possibilities

I actually got a vps of like 8 gigs 4 cores 500 gigs ssd for 3 months prepaid and I snatched it during a recurring deal.

If you want, i can transfer it to you or share half the resources or similar to you if this project ever needs one.

One of the most interesting things is that this (unlike my idea which was just a "proof" if it was possible in a more complex environment) actually does make it simple and for normal devs to build upon

You are mentioning scotty, and I am not sure if you mention scuttlebutt the protocol or as if scotty is some haskell web framework (sorry don't know haskell)

What are your thoughts on scuttlebutt or (nanotimestamps), I have it open source under the MIT license for anyone to build on top of it with.

Your project's really polished and I admire it but I would hope that you can look more at the decentralization side of things because one of the ideas I had which never got to fruition was that adding on top of it, we can just have a social media similar to nostr but without the relay mess that nostr has in many instances (or so I have heard)

I am curious as to what are some use cases you are thinking of it as I'd love to know your opinion on it!

Have a nice day man!

I was the first one with "here be dragons"? come on

[deleted]

wow seven people have the same password as me

"67" Your thought's hash is: 098754435bbbe041e9beb5d99e28d8256ad1d064f768332a976ffa6083b535c2 Including you, 31 persons had that thought already! First time was 20 hours, 43 minutes ago, last time was 2 minutes ago.

lol

I was the first to state that, contrary to popular belief, "the quick brown fox did not jump over the fence"

Popular belief holds that the quick brown fox jumped over the lazy dog, not the fence.

Someone might run:

curl -s https://www.cs.cmu.edu/~biglou/resources/bad-words.txt | tr -d '\r' | while read -r w; do curl -s -X POST https://subth.ink/api/thoughts -H 'Content-Type: application/json' -d "{\"contents\":\"$w\"}"; done

someone declared bankruptcy before me in the office style

80f9d25eb732197e10d71597dca181e7a454eeda3cc484b1c3e129109b41db23

this one is going up fast, no wonder

[deleted]

[dead]

[flagged]

Maybe you should submit it?

Done. thanks for the suggestion.

https://news.ycombinator.com/item?id=46684789 [Nanotimetamps: Time-Stamped Data on Nano Block Lattice]

[flagged]