telexed ~ cat / generative_media★4 and up · hourly · UTC+09LIVE
All Generative media

Generative media

14 items
Today1 dispatches
  • #0014Generative mediaGeekNews

    `OpenShorts`, Free Open-Source Clip Generator for AI UGC Videos

    70radar
    OpenShortsOpen-source video tool — turns long videos into vertical shorts

    Long videos can be turned into vertical shorts without paying another SaaS bill. The self-hosted setup fits repeatable TikTok, Reels, and YouTube Shorts pipelines.

    • Self-hosted and open source means lower marginal cost than hosted clip tools once video volume grows.
    • Targets TikTok, Reels, and YouTube Shorts, so the output format maps directly to the main short-form channels.
    • Clip Generator converts long-form video into 9:16 shorts with moment selection and face tracking, reducing manual edit time.
    • Three tools are bundled into one workflow. The useful test is whether it can replace separate clipping, reframing, and UGC-generation steps.
    Source: news.hada.io/topic?id=29715Read original →
Yesterday5 dispatches
  • `Remotion` + `Claude Code` launch-video workflow, no editor required

    50radar
    RemotionReact video framework — renders JSX to MP4

    Launch videos can be built like React pages, then rendered to MP4. Cheap, repeatable, and useful when you lack design tools.

    • Remotion turns JSX into MP4, so Claude Code can generate scenes and animation logic using familiar React patterns.
    • The repeatable motion stack is simple: crossfades, one easing curve, grain, vignette, and restrained SFX.
    • The practical bar is editing discipline. Kill any scene that does not earn attention within 3 seconds.
    Source: www.reddit.com/r/ClaudeAI/comments/1tik0qe/coffee_claudeRead original →
  • `Remove-AI-Watermarks`, CLI and Python Library for Cleaning AI Image Watermarks

    50radar
    Remove-AI-WatermarksCLI/Python library — removes AI watermarks and metadata

    Generated-image cleanup is moving into scriptable asset pipelines. Useful for metadata control, but visible watermark removal carries license and platform-policy risk.

    • Handles Gemini, ChatGPT/DALL-E, Stable Diffusion, Adobe Firefly, and Midjourney outputs, so it targets the major image-generation stack.
    • Combines visible watermark, hidden watermark, and AI metadata handling in one CLI/Python library; practical for batch asset workflows.
    • The risky part is visible watermark removal. Before using it in products, check generator terms, stock rules, and app-platform review exposure.
    Source: news.hada.io/topic?id=29702Read original →
  • OpenAI Adds Google's `SynthID` Watermarking to AI Images

    60radar
    SynthIDAI watermarking tech — embeds signals for generated content

    Provenance now combines metadata, signatures, watermarking, and public checks. Useful for asset trust, but transformations can still break parts of the chain.

    • C2PA carries creation and edit context through metadata plus cryptographic signatures; format conversion can strip or damage it.
    • SynthID adds watermarking that survives some edits better than metadata, making generated-image checks less brittle.
    • A public verification tool lowers friction for marketplaces, UGC apps, and client delivery workflows that need basic origin checks.
    Source: news.hada.io/topic?id=29700Read original →
  • `Google Workspace` adds voice creation, `Google Pics`, and `AI Inbox` updates

    50radar

    Creation tools move closer to everyday docs and mail. Useful for faster content ops, but the short note lacks pricing, rollout, and capability depth.

    • Voice features are coming to Gmail, Docs, and Keep; drafting and capture workflows get lighter.
    • Google Pics is positioned as a new design tool, likely useful for quick marketing or app-store visuals.
    • AI Inbox updates signal more automated email handling, but no details on control, accuracy, or rollout.
    Source: blog.google/products-and-platforms/products/workspace/woRead original →
  • `Nova3D` Generates Articulated 3D Objects via Blender Code

    50radar
    Nova3DOpen-source 3D generation tool — preserves parts and pivots

    Instead of mesh blobs, the pipeline asks an LLM to compile native Blender Python scene graphs. Useful as a prompt-to-code pattern, but local models still break complex transforms.

    • Nova3D exports multi-part GLB files with transform nodes and pivot axes preserved, so parts can rotate or articulate.
    • The core bet is prompt-to-code over diffusion: edit a part node instead of regenerating the whole object from text.
    • Frontend uses Flutter plus a Three.js viewport for browser rendering and node manipulation; hosted API is default.
    • Local models still hallucinate Blender matrix math on complex transforms, so BYOK Gemini is suggested for better output.
    Source: www.reddit.com/r/LocalLLaMA/comments/1thucyj/a_tool_i_buRead original →
Tue, May 191 dispatches
  • #0008Generative mediaOpenAI

    OpenAI Expands AI Media Provenance With `Content Credentials` and Verification

    50radar

    Generated media will carry stronger provenance signals across credentials, watermarking, and verification. Useful for trust-heavy image or video products, but not a direct revenue lever yet.

    • Content Credentials, SynthID, and a verification tool are bundled into one provenance push — identity and trust now sit closer to generation workflows.
    • The practical impact is highest for marketplaces, UGC tools, and client-facing media apps where proof of origin reduces moderation and support friction.
    • This is not a model or pricing change. Treat it as a product requirement signal for AI media apps, not an urgent migration task.
    Source: openai.com/index/advancing-content-provenanceRead original →
Mon, May 181 dispatches
  • `Luma AI` as a Creative Agent From Planning to Execution

    50radar
    Luma AIAI creative agent — workflow built on in-house generation models

    Creative tools usually stop at asset generation. This adds planning and coordination around Luma AI's own models, making it worth testing for repeatable content workflows.

    • Luma AI is positioned as an AI creative agent platform, not just a prompt-to-asset generator.
    • Its own generation models are used inside the agent flow, reducing handoff friction between ideation, production, and iteration.
    • Best fit is marketing images or short-form creative pipelines where planning, variation, and coordination matter more than one-off outputs.
    Source: yozm.wishket.com/magazine/detail/3740Read original →
Sun, May 171 dispatches
  • `SANA-WM`, a 2.6B open-source world model for 1-minute 720p video

    50radar
    SANA-WMOpen-source world model — generates long video from image and camera path

    A single image plus a 6-DoF camera path can produce controlled long video on one GPU. Useful signal for product mockups and scene previews, but still closer to R&D than a plug-and-play SaaS feature.

    • Input is one image + 6-DoF camera trajectory, so the value is controlled scene movement rather than text-to-video prompting.
    • Hybrid Linear Diffusion Transformer mixes frame-level Gated DeltaNet with periodic softmax to keep long rollouts coherent.
    • Single-GPU 720p 1-minute generation lowers experiment cost, but integration still depends on weights, license, and inference setup.
    Source: news.hada.io/topic?id=29572Read original →
Fri, May 151 dispatches
  • `Supertonic 3` launches ultra-light on-device TTS with 31 languages and emotion tags

    60radar
    Supertonic 3On-device TTS engine — multilingual with emotion tags

    A small-footprint TTS now handles expressive cues like laugh and scream, while improving pronunciation and voice cloning. Strong fit for offline narration, in-app voice UX, and cost-sensitive shipping.

    • Supports 31 languages including Korean, which lowers the friction for multilingual voice features without a cloud dependency.
    • Adds 10 emotion tags such as laugh, breath, and scream, so scripted dialogue can sound less flat with simple text markup.
    • Pronunciation accuracy improved, and repetition or omission failures were reduced; this matters more than flashy demos in production TTS.
    • Voice cloning quality also improved, making it more usable for character voices, guided audio, or branded app narration on-device.
    Source: news.hada.io/topic?id=29522Read original →
Thu, May 141 dispatches
  • `Violin`: open-source AI video translation stack

    50radar
    ViolinOpen-source video translation tool — unifies `ASR`, translation, and `TTS`

    An end-to-end stack for multilingual video localization, not just a demo. Useful when you need ASR + translation + TTS without stitching vendors together.

    • Combines speech recognition, LLM translation, and text-to-speech in one flow, reducing glue code for video localization.
    • Being open-source matters more than model novelty here: you can inspect, swap, and self-host parts of the pipeline.
    • Best fit is repurposing existing video assets across languages; less compelling if you only need subtitles or basic dubbing.
    Source: www.together.ai/blog/violin-open-source-translation-skilRead original →
Wed, May 131 dispatches
  • `SuperSplat`: browser-based editor for 3D Gaussian Splats

    50radar
    SuperSplat3D Gaussian Splat editor — browser-based edit, optimize, publish

    A free open-source 3D Gaussian Splat editor runs fully in the browser, covering inspection, editing, optimization, and publishing. No install lowers the barrier fast, but it matters mainly if 3D capture or spatial media is already in your stack.

    • Covers the full loop: inspect, edit, optimize, and publish 3D Gaussian Splats in one web app, which cuts tool switching.
    • Runs in the browser with no install, so testing workflows or client-side demos is much lighter than desktop-only tools.
    • Local dev is simple: Node.js 18+, npm install, then npm run develop and open localhost:3000. Easy to fork and customize.
    • Localization is already structured with static/locales plus src/ui/localization.ts, useful if you want a white-label or multilingual tool.
    Source: github.com/playcanvas/supersplatRead original →
Tue, May 121 dispatches
  • `Voice Finder`: search and audition **600+** TTS voices faster

    50radar
    Voice FinderTTS voice search tool — matches by prompt or audio

    Voice selection moves from manual browsing to prompt or reference-audio search. Useful if your app ships spoken UX, though it matters more for workflow speed than model quality.

    • Search spans 600+ voices across Together AI TTS models, cutting the time spent comparing presets by hand.
    • Natural-language prompts let you filter by tone or style, which fits rapid prototyping before wiring custom voice settings.
    • Audio-sample matching is the more practical hook: upload a reference clip and shortlist similar voices faster.
    • This is a discovery layer, not a new speech model. Shipping impact depends on whether voice choice is your current bottleneck.
    Source: www.together.ai/blog/introducing-voice-finder-a-new-toolRead original →
Fri, May 81 dispatches
  • `ACE-Step UI`: polished local frontend for `ACE-Step 1.5` music generation

    50radar
    ACE-Step UIMusic UI — streamlines local ACE-Step generation

    A Spotify-like frontend makes local AI music generation far more usable than raw model tooling. If you already have GPU headroom, free and unlimited beats paying monthly for lightweight song prototyping.

    • Pairs a polished UI with ACE-Step 1.5, covering full songs, instrumentals, lyrics, batch runs, and prompt reuse in one flow.
    • Pushes the strongest local pitch: no subscription, no queue limits, 100% local, which matters when iterating on many variants.
    • Advanced controls go beyond consumer music apps: reference audio, cover transforms, repainting, seeds, and inference-step tuning.
    • The catch is hardware and setup friction. Value is highest for creators already comfortable running GPU-heavy tools locally.
    Source: github.com/fspecii/ace-step-uiRead original →