SeamlessExpressive enables the transfer of tones, emotional expression, and vocal styles in speech translation. You can try a demo of SeamlessExpressive using your own voice as input here.
SeamlessStreaming, a new model that enables streaming speech-to-speech and speech-to-text translations with <2 seconds of latency and nearly the same accuracy as an offline model. In contrast to conventional systems which translate when the speaker has finished their sentence, SeamlessStreaming translates while the speaker is still talking. t intelligently decides when it has enough context to output the next translated segment.
SeamlessM4T v2, a foundational multilingual & multitask model for both speech & text. It's the successor to SeamlessM4T, demonstrating performance improvements across ASR, speech-to-speech, speech-to-text & text-to-speech tasks.
Seamless, is a model that merges capabilities from SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 into one.
Stability AI released SDXL Turbo: a real-time Text-to-Image generation model. SDXL Turbo is based on a new distillation technology, which enables the model to synthesize image outputs in a single step and generate real-time text-to-image outputs while maintaining high sampling fidelity [Details].
Meta AI has created CICERO, the first AI agent to achieve human-level performance in the complex natural language strategy game Diplomacy. CICERO played with humans on webDiplomacy.net, an online version of the game, where CICERO achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game [Details].
Mozilla’s innovation group and Justine Tunney released llamafile which lets you distribute and run LLMs with a single file. llamafiles can run on six OSes (macOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD) and on multiple CPU architectures [Details].
Perplexity released two new PPLX models: pplx-7b-online and pplx-70b-online. These online LLMs can leverage the most up-to-date information using the internet when forming a response [Details].
Google DeepMind presented GNoME (Graph Networks for Materials Exploration): an AI tool that discovered 2.2 million new crystal structures, with 380,000 being highly stable and promising for breakthroughs in superconductors, supercomputers, and advanced batteries for electric vehicles [Details].
Amazon introduced two new Amazon Titan multimodal foundation models (FMs): Amazon Titan Image Generator (preview) and Amazon Titan Multimodal Embeddings. All images generated by Amazon Titan contain an invisible watermark [Details].
Researchers present Animatable Gaussians, a new avatar representation method that can create lifelike human avatars from multi-view RGB videos [Details].
Pika Labs released a major product upgrade of their generative AI video tool, Pika 1.0, which includes a new AI model capable of generating and editing videos in diverse styles such as 3D animation, anime, cartoon and cinematic using text, image or existing video [Details].
Eleven Labs announced a grant program offering 11M text characters of content per month for the first 3 months to solo-preneurs and startups [Details].
Researchers from UC Berkeley introduced Starling-7B, an open large language model trained using Reinforcement Learning from AI Feedback (RLAIF). It utilizes the GPT-4 labeled ranking dataset, Nectar, and a new reward training pipeline. Starling-7B outperforms every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo [Details].
XTX Markets is launching a new $10mn challenge fund, the Artificial Intelligence Mathematical Olympiad Prize (AI-MO Prize) The grand prize of $5mn will be awarded to the first publicly-shared AI model to enter an AI-MO approved competition and perform at a standard equivalent to a gold medal in the in the International Mathematical Olympiad (IMO) [Details] .
0 Comments