The catarrhine yerba mate enjoyer who invented a perpetual motion machine, by dreaming at night and devouring its own dreams through the day.

Кўис кредис ессе, Беллум?

  • 15 Posts
  • 204 Comments
Joined 4 years ago
cake
Cake day: April 9th, 2021

help-circle

  • Even in this thread there’s discussion of a show that blatantly tittilates the audience with underage characters that would absolutely qualify as csam in any other community except in the anime community, for some reason.

    Emphasis mine. If what you are saying is indeed correct (is it? dunno), this is a sign that the acronym “CSAM” was completely derailed.

    Originally the expression “child sexual abuse material” was coined to avoid implications of consent brought by the word “pornography”, and it boils down to “evidence of child sexual abuse”. Consent and sexual abuse are legal notions that only apply to real people, not to fictional characters.

    In the meantime, at worst the instance in question depicts images of clearly fictional characters in suggestive poses and/or clothing. It does not classify even as pornography, let alone sexual abuse. (Note that not even hentai depicting clearly adult characters is allowed in that instance.)

    I don’t care about what the maintainers’ view of the matter is, I make (and sometimes delete) my comments based on my own view of it.

    Given that this is a touchy subject, I think that this matter is better handled neither by the maintainers’ views nor by our own views, but by 1) legal definitions of governments that might be relevant in the matter, and 2) explicit moral premises.


  • Yeah, but the admins, as the thread has shown, are mainly reining in violations of sitewide policy. Instance rules are mainly the job of mods.

    So the admins are reining in violations of lemmy.ml-wide policy… while lemmy.ml rules are mainly the job of the mods??? Congratulations, that’s the dumbest thing that I’ve read today.

    Couple the above with the backpedalling (from “This is what mods are for.” to “Instance rules are mainly the job of mods.”; emphasis on “mainly”) - a sleight of hand, while lying that I was the one using a sleight of hand - and I’m led to the conclusion that you have nothing meaningful to add to this discussion, and can be safely ignored as dead weight and noise.


    Unlike the above, does anyone here have any decent counter-argument against “migrating this comm to that other instance would be sensible”?













  • The source that I’ve linked mentions semantic embedding; so does further literature on the internet. However, the operations are still being performed with the vectors resulting from the tokens themselves, with said embedding playing a secondary role.

    This is evident for example through excerpts like

    The token embeddings map a token ID to a fixed-size vector with some semantic meaning of the tokens. These brings some interesting properties: similar tokens will have a similar embedding (in other words, calculating the cosine similarity between two embeddings will give us a good idea of how similar the tokens are).

    Emphasis mine. A similar conclusion (that the LLM is still handling the tokens, not their meaning) can be reached by analysing the hallucinations that your typical LLM bot outputs, and asking why that hallu is there.

    What I’m proposing is deeper than that. It’s to use the input tokens (i.e. morphemes) only to retrieve the sememes (units of meaning; further info here) that they’re conveying, then discard the tokens themselves, and perform the operations solely on the sememes. Then for the output you translate the sememes obtained by the transformer into morphemes=tokens again.

    I believe that this would have two big benefits:

    1. The amount of data necessary to “train” the LLM will decrease. Perhaps by orders of magnitude.
    2. A major type of hallucination will go away: self-contradiction (for example: states that A exists, then that A doesn’t exist).

    And it might be an additional layer, but the whole approach is considerably simpler than what’s being done currently - pretending that the tokens themselves have some intrinsic value, then playing whack-a-mole with situations where the token and the contextually assigned value (by the human using the LLM) differ.

    [This could even go deeper, handling a pragmatic layer beyond the tokens/morphemes and the units of meaning/sememes. It would be closer to what @njordomir@lemmy.world understood from my other comment, as it would then deal with the intent of the utterance.]


  • Not quite. I’m focusing on chatbots like Bard, ChatGPT and the likes, and their technology (LLM, or large language model).

    At the core those LLMs work like this: they pick words, split them into “tokens”, and then perform a few operations on those tokens, across multiple layers. But at the end of the day they still work with the words themselves, not with the meaning being encoded by those words.

    What I want is an LLM that assigns multiple meanings for those words, and performs the operations above on the meaning itself. In other words the LLM would actually understand you, not just chain words.


  • Complexity does not mean sophistication when it comes to AI and never has and to treat it as such is just a forceful way to make your ideas come true without putting in the real effort.

    It’s a bit off-topic, but what I really want is a language model that assigns semantic values to the tokens, and handles those values instead of directly working with the tokens themselves. That would be probably far less complex than current state-of-art LLMs, but way more sophisticated, and require far less data for “training”.




  • If you want, you could use GMail filters to delete those emails automatically. Here’s how:

    1. click the engine button (settings), then “see all settings”, then “filters and blocked addresses”.
    2. click “create a new filter”. Add “top of Google search” to the field “has the words”, leave other fields blank.
    3. click “create filter”, then check the “delete it” box, then “create filter” again.
    4. repeat steps 2-3 for other shit that SEO spam is likely to mention.

    Important: never use as a filter anything that legitimate users might reasonably say. Only things that you’re fairly certain to come from a spammer.

    EDIT: I repeated two steps without noticing it. My bad.