Nature: A large dataset of scientific text reuse in Open-Access publications

Nature: A large dataset of scientific text reuse in Open-Access publications. “We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications. It contains 91 million cases of reused text passages found in 4.2 million unique open-access publications. Cases range from overlap of as few as eight words to near-duplicate publications and include a variety of reuse types, ranging from boilerplate text to verbatim copying to quotations and paraphrases.”

MakeUseOf: The 4 Best Pastebin Alternatives for Sharing Code and Text

MakeUseOf: The 4 Best Pastebin Alternatives for Sharing Code and Text. “The aptly named Pastebin.com was the first text storage website of its kind. It’s used for easily storing and sharing snippets of code or text with other people online. But if you don’t care for it, you’ll find plenty of alternatives to Pastebin on the web. Let’s look at the best Pastebin alternatives you can use for storing text and code. We’ll examine their best features and why they’re worth using over the well-known service.”

WIRED: How to Extract the Text From Any Image

WIRED: How to Extract the Text From Any Image. “THERE ARE PLENTY of reasons why you might want to pull the text out of an image you find online: instructions on a YouTube still, for example, or items on a printed menu, or inspirational quotes in your Instagram feed. Whatever the reason, there are text extraction tools that will do the job of recognizing and copying the words inside those images for you. As image identification techniques improve, these tools are getting better and better at accurately converting text in an image into usable, editable text.”

New York Times: A.I. Is Mastering Language. Should We Trust What It Says?

New York Times: A.I. Is Mastering Language. Should We Trust What It Says?. “GPT-3 belongs to a category of deep learning known as a large language model, a complex neural net that has been trained on a titanic data set of text: in GPT-3’s case, roughly 700 gigabytes of data drawn from across the web, including Wikipedia, supplemented with a large collection of text from digitized books. GPT-3 is the most celebrated of the large language models, and the most publicly available, but Google, Meta (formerly known as Facebook) and DeepMind have all developed their own L.L.M.s in recent years.”

UX Collective: The power of seeing only the questions in a piece of writing

UX Collective: The power of seeing only the questions in a piece of writing. “I’ve been watching how writers use questions lately, and thought: Hmmm, it’d be cool to see only the questions in a piece of prose. I probably started down this line of thinking because last fall I created a little web tool that removes everything but the punctuation from a piece of writing. That tool wound up being a pretty intriguing type of literary x-ray: I discovered, for example, that I use a ton of parentheticals (and way too many m-dashes). Since I already had the code for that, it wasn’t too hard for me to program a version focuses on questions instead.”

Make Tech Easier: 6 of the Best Online Summarizer Tools to Shorten Text

Make Tech Easier: 6 of the Best Online Summarizer Tools to Shorten Text. “Using these nifty online tools, you can copy-paste text or URLs into a box, set your parameters for just how heavily summarized you want it to be, then click a big button to get the low-down on a given article in just a few sentences. Here are our favorite tools for this purpose.”

Analytics India: Google Releases Wikipedia-Based Image Text (WIT) Dataset

Analytics India: Google Releases Wikipedia-Based Image Text (WIT) Dataset. “Google recently released a Wikipedia-Based Image Text (WIT) dataset, a large multimodal dataset created by extracting various text selections associated with an image from Wikimedia image links and articles. It was conducted by rigorous filtering to retain high-quality image-text sets. “

New York Times: Text Memes Are Taking Over Instagram

New York Times: Text Memes Are Taking Over Instagram. “Known in internet slang as shitposting, this style of posting involves people publishing low-quality images, videos or comments online. On Instagram, this means barraging people’s feeds with seemingly indiscriminate content, often accompanied by humorous or confessional commentary. A growing ecosystem of Instagram accounts has embraced this text-heavy posting style, which has exploded in popularity among Gen Z users during the pandemic.”

FedTech: History of Lorem Ipsum

FedTech: History of Lorem Ipsum. “Have you ever seen the term Lorem Ipsum on a new website? Perhaps you have even tried entering it on Google Translate, but no sensible results came through. Most people who see it the first time think they are in the wrong address only to refresh and come back to the same page. But, what is this mysterious text that you see on pages?”

The Verge: Google introducing a feature in Chrome 90 to create links to highlighted text on a webpage

The Verge: Google introducing a feature in Chrome 90 to create links to highlighted text on a webpage. “An upcoming feature in Chrome 90 will allow users to create a link to a section of a website that they’ve highlighted. First launched as a browser extension called Link to Text Fragment last year, Google has now added the feature within Chrome itself.”

Science Daily: Computer scientists develop new tool that generates videos from themed text

Science Daily: Computer scientists develop new tool that generates videos from themed text. “A global team of computer scientists, from Tsinghua and Beihang Universities in China, Harvard University in the US and IDC Herzliya in Israel, have developed ‘Write-A-Video,’ a new tool that generates videos from themed text. Using words and text editing, the tool automatically determines which scenes or shots are chosen from a repository to illustrate the desired storyline. The tool enables novice users to produce quality video montages in a simple and user-friendly manner that doesn’t require professional video production and editing skills.”