| United States Patent Application |
20060025214
|
| Kind Code
|
A1
|
|
Smith; Darren C.
|
February 2, 2006
|
Voice-to-text chat conversion for remote video game play
Abstract
A multi-player networked video game playing system including for example
video game consoles analyzes speech to vary the font size and/or color of
associated text displayed to other users. If the amplitude of the voice
is high, the text displayed to other users is displayed in a larger than
normal font. If the voice sounds stressed or is aggressive words are
used, the text displayed to other users is displayed using a special font
such as red color. Other analysis may be performed on the speech in
context to vary the font size, color, font type and/or other display
attributes.
| Inventors: |
Smith; Darren C.; (Sammamish, WA)
|
| Correspondence Name and Address:
|
NIXON & VANDERHYE, P.C.
901 NORTH GLEBE ROAD, 11TH FLOOR
ARLINGTON
VA
22203
US
|
| Assignee Name and Adress: |
Nintendo of America Inc.
Redmond
WA
|
| Serial No.:
|
901452 |
| Series Code:
|
10
|
| Filed:
|
July 29, 2004 |
| U.S. Current Class: |
463/30 |
| U.S. Class at Publication: |
463/030 |
| Intern'l Class: |
A63F 13/00 20060101 A63F013/00 |
Claims
1. A multi-player video game playing method comprising: converting a
player's speech to text; analyzing said speech and/or said text for at
least one predetermined characteristic; and displaying, to at least one
other video game player, at least a portion of said text including a
distinctive indication responsive to said analysis.
2. The method of claim 1 wherein said indication comprises a display
format.
3. The method of claim 1 wherein said indication includes font size.
4. The method of claim 1 wherein the indication includes font color.
5. The method of claim 1 wherein the indication includes punctuation.
6. The method of claim 1 wherein the indication includes font style.
7. The method of claim 1 wherein the characteristic comprises amplitude.
8. The method of claim 1 wherein the characteristic comprises the use of
predetermined stress words.
9. The method of claim 1 wherein the characteristic comprises emotion.
10. The method of claim 1 wherein the characteristic comprises a threat.
11. Video game playing equipment comprising: a computing device executing
video game play instructions; a microphone that, in use, receives speech
from at least one human game player; a speech-to-text converter that
converts said received speech into text; an analyzer that analyzes said
speech and/or text to determine whether at least one predetermined
characteristic is present; and a text formatter that formats said text
for display at least in part in response to said analyzer determination.
12. A digital storage medium comprising: a first program instruction
storage area storing video game play instructions; a second instruction
storage area storing instructions for converting speech into text; a
third instruction storage area storing instructions that analyze said
speech and/or text to determine whether a predetermined characteristic is
present; and a fourth instruction storage area that stores display format
instructions that format said text for display based at least in part on
said analysis performed by said analyzing instructions.
13. A video game chat system comprising: a plurality of video game play
sites, each said site including a user input device and a display, said
display providing interactive video game play in response to user inputs
said user input device provides, wherein at least one of said sites
further includes an audio transducer that picks up speech; a speech
recognizer coupled to said audio transducer, said speech recognizer
converting said speech into displayable indicia and further analyzing
said speech to determine whether a predetermined characteristic is
present therein; and a display formatter that displays said displayable
indicia on at least one said displays, said display formatter formatting
said display at least in part in response to whether said predetermined
characteristic is present.
Description
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This case is related to commonly assigned copending patent
application Serial No. ______, entitled "Video Game Voice Chat With
Amplitude-Based Virtual Range" (attorney docket 723-1488), incorporated
herein by reference.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0002] Not applicable.
FIELD
[0003] The technology herein relates to remote or networked video game
play, and more particularly to networked video game play wherein remote
users can communicate with one another. In still more detail, the
technology herein relates to method and apparatus providing remote video
game play wherein a player's speech is converted into text chat and
responsively formatted for textual display or other indication at remote
player sites.
BACKGROUND AND SUMMARY
[0004] Networked and remote video game play has become increasingly
popular. For several years now, game players using personal computers
have played Doom, Quake and other multiplayer networked games over the
Internet. Such multiplayer games can involve a number of different game
players from all over the country or the world.
[0005] One especially interesting genre of remote video games uses a team
approach where the various players align themselves in teams and work
together to accomplish a particular objective (defeat another team, beat
another team in locating a treasure or fulfilling some other quest,
etc.). It is useful in these and other multiplayer video game contexts to
allow the various game players to communicate with one another during
game play. For example, members of the same team may wish to strategize
so they can work together more effectively. Sometimes, players on
opposite sides of a challenge may wish to communicate information or
otherwise coordinate their game play. Adding an inter-player
communications capability raises the fun factor substantially. Rather
than simply sitting alone in front of a computer or television set moving
a game character on a screen, the game play experience becomes much more
interactive and personal when one is communicating with a group of
friends or acquaintances.
[0006] While some game players have been known to talk together on the
telephone while they are involved in remote game play, many in the gaming
industry have sought to provide a chat capability as a part of or as an
adjunct to the video game software. Early approaches, especially on PC
games, provided a text chat capability allowing players to send text
messages to one another. A player would use the keyboard to type in a
message which was instantly sent over the same communications medium
carrying interactive game play information back and forth. Such text
messages could be replied to by other players in the same way to provide
interactive text "chat" communications.
[0007] The effectiveness of such text chat capabilities depended on the
type of game. For a relatively slow-moving long term adventure or other
game, text chat could be quite effective in allowing players to
coordinate their activities while at the same time communicating fun and
interesting information about themselves. However, because of the
required use of a keyboard to input the text information, many players
found text chat to be somewhat incompatible with other types of games
such as more fast-moving interactive games with time pressure. Many
personal computer and other games are primarily controlled through use of
a joystick or other game type controller. To send a text chat message,
the user generally needed to move his or her hands off of the game
controller onto a keyboard to begin typing. Once the user finished typing
a message, he or she hit a "send" button and then returned to interacting
with the video game using the joystick or other game controller. While
the user's hands were on the keyboard, the user was often unable to
interact with the game via the joystick. Such interruptions were found to
be generally undesirable. Furthermore, not all game players have good
typing skills. Younger game players or those who have not yet learned to
touch type often found the keyboard to be an obstacle that tended to slow
down fast-moving video game play.
[0008] To solve this problem and also take advantage of the relatively
higher communications bandwidths now available to most gamers via DSL,
cable or other communications means, several software developers and game
companies developed voice chat capabilities for use in remote video game
play. To use voice chat, game players typically put on headsets that
include both earphones and a microphone. Software and hardware within the
personal computer or gaming platform digitizes voice picked up by the
microphone and transmits the resulting digital information to other game
players. At the remote side, received digitized speech signals are
converted back into audio, amplified and played back through remote game
players' headsets. Voice chat eliminates the need for game players to use
a keyboard while providing nearly instantaneous inter-player
communications and coordination.
[0009] While voice chat has been widely adopted in the gaming community
and has achieved a fair degree of success, text chat is still being used
by some because of several advantages it provides over voice chat.
Communicating with other online players in massive multiplayer online
role playing games, for example, is still often provided by text chat
rather than voice chat. Text chat provides a record of conversations so
that players can review exactly what was said by other players, and also
provides the ability to easily identify the player who sent a particular
message (text can be tagged with a speaker's identity). In addition,
using text chat, one player's statements can be easily separated from
another player's statements since the text typically appears separately
(this can also be done with voice chat using a half-duplex type
communications system, but this might be somewhat frustrating to the
speakers). Additionally, unlike most voice chat, text chat provides the
ability to mask the player's true identity. This can be useful when the
game play includes avatars that in effect provide an "alter ego" for each
human player. For example, if a 12 year old boy is playing the role of a
40 year old warrior, voice chat can spoil or detract from the game play
experience since the warrior ends up having the voice of a 12 year old.
Additional advantages of text chat include the ability to monitor and
censor player conversations for bad language, and reduction in the amount
of bandwidth required to convey the information.
[0010] Despite the continued usefulness of text chat in some game play
contexts, using the keyboard continues to have significant disadvantages,
especially for console or other game platforms that do not include
keyboards. A keyboard is a bulky accessory, and it detracts from game
play if the user has to remove his hands from the controller to type a
message. The impersonation problem with voice chat can be addressed by
providing voice filters that alter the sound of a player's voice, but so
far players have not generally been using such voice masking since the
resulting sound quality can be relatively low and intelligibility ends up
being sacrificed.
[0011] In some non-gaming contexts (e.g., America Online's Instant
Messenger), some have attempted to provide a chat alternative in the form
of voice-to-text conversion. However, further improvements in the gaming
context are necessary and desirable if such techniques are to become more
widely adopted.
[0012] The technology herein addresses these problems by providing a video
game chat capability with voice-to-text conversion that identifies
characteristics of the player's speech and selects text display
formatting based on such identified characteristics. In more detail, a
non-limiting illustrative exemplary implementation runs on a video game
console or associated server and analyzes the player's speech to vary the
font size, color or other text display formatting for display to other
users. For example, if the amplitude of a player's voice is high, the
text may be displayed to other users in a larger than normal font. If the
voice sounds stressed or aggressive words are used, the text is displayed
to other users in a special format (e.g., using a distinctive color such
as red or other distinctive formatting). Other analysis may be done on
speech in context to vary the text formatting options such as font size,
color, font type, or other aspects of the text presentation and/or
display.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] These and other features and advantages will be better and more
completely understood by referring to the following detailed description
in conjunction with the drawings of which:
[0014] FIG. 1 is a schematic illustration of an exemplary, illustrative
non-limiting implementation;
[0015] FIG. 2 is a schematic diagram of an exemplary, illustrative
non-limiting voice-to-text conversion;
[0016] FIG. 3 is a flowchart of an exemplary, illustrative non-limiting
text formatting and display; and
[0017] FIG. 4 shows an example illustrative non-limiting implementation of
a program instruction storage medium.
DETAILED DESCRIPTION
[0018] FIG. 1 schematically shows an example non-limiting illustrative
implementation of a multi-player gaming system 10. In the example
implementation shown, video game player 12(1) plays a video game against
another video game player 12(2) (any number of players can be involved).
Video game players 12(1) and 12(2) may be remotely located, with
communications being provide between them via a network 14 such as the
Internet or any other signal path capable of carrying game play data or
other signals. In the example system 10 shown, each game player 12 has
available to him or her electronic video game playing equipment 16. In
the example shown, video game playing equipment 16 may comprise for
example a home video game platform such as a NINTENDO GAMECUBE system
connected to a handheld game controller 18 and a display device 20 such
as a home color television set. In other examples, game playing equipment
16 could comprise a handheld networked video game platform such as a
NINTENDO DS or GAMEBOY ADVANCE, a personal computer including a monitor
and appropriate input device(s), a cellular telephone, a personal digital
assistant, or any other electronic or other appliance.
[0019] In the example system 10 shown, each of players 12 has a headset 22
including earphones 24 and a microphone 26. Earphones 24 receive audio
signals from game playing equipment 16 and play them back into the player
12's ears. Microphone 26 receives acoustical signals (e.g., speech spoken
by a player 12) and provides associated audio signals to the game playing
equipment 16. In other exemplary implementations, microphone 26 and
earphones 24 could be separate devices or a loud speaker and appropriate
feedback-canceling microphone could be used instead. In the example shown
in FIG. 1, both of players 12(1) and 12(2) are equipped with a headset
22, but depending upon the context it may be that only some subset of the
players have such equipment.
[0020] In the example system 10 shown, each of players 12 interacts with
video game play by inputting commands via a handheld controller 18 and
watching a resulting display (which may be audio visual) on a display
device 20. Software and/or hardware provided by game playing platforms 16
produce interactive 2D or 3D video game play and associated sound. In the
example shown, each instance of game playing equipment 16 provides
appropriate functionality to produce local video game play while
communicating sufficient coordination signals for other instances of the
game playing equipment to allow all players 12 to participate in the
"same" game. In some contexts, the video game could be a multiplayer
first person shooter, driving, sports or any other genre of video game
wherein each of players 12 can manipulate an associated character or
other display object by inputting commands via handheld controllers 18.
For example, in a sports game, one player 12(1) could control the players
of one team, while another player 12(2) could control the players on an
opposite team. In a driving game, each of players 12(1), 12(2) could
control a respective car or other vehicle. In a flight or space
simulation game, each of players 12 may control a respective aircraft. In
a multi-user role playing game, each of players may control a respective
avatar that interacts with other avatars within the virtual environment
provided by the game. Any number of players may be involved depending
upon the particular game play.
[0021] As will be seen in FIG. 1, a game server 28 may optionally be
provided to coordinate game play. For example, in the case of a complex
multiplayer role playing game having tens or even hundreds of players 12
who can play simultaneously, a game server 28 may be used to keep track
of the master game playing database and to provide updates to each
instance of game playing equipment 16. In other game playing contexts, a
game server 28 may not be necessary with all coordination being provided
directly between the various instances of game playing equipment 16.
[0022] In the particular example system 10 shown in FIG. 1, a
voice-to-voice text chat capability is provided. As can be seen, player
12(1) in this particular example is speaking the following words into his
or her microphone 26: [0023] "I'm going to blast you."
[0024] In response to this statement, game playing equipment 16 and/or
game server 28 converts the spoken utterance into data representing
associated text along with formatting information responsive to detected
characteristics of the utterance. For example, the speech-to-text
converter may recognize the term "blast" as being a special "threat"
term, and cause the resulting text message to be displayed on the other
player(s)' display 20(2) using a special format such as for example: "I'm
going to BLAST you."
[0025] The special formatting may be the user of all capital letters, use
of a special size or style of font (e.g., italics, bold, or some other
special typeface), the use of a special color (e.g., red for threats,
blue for statements of friendship, green for statements of emotion,
yellow for statements of fear, etc.), or any other sort of distinctive
visual, aural or other indication.
[0026] As another example shown in FIG. 1, suppose player 12(1) says "I'm
going to blast you!" in a loud voice emphasizing the word "you." The
non-limiting exemplary speech-to-text converter in the example system 10
shown in FIG. 1 recognizes the increased amplitude and/or different
inflection or emphasis placed on the word "you" and may provide an
associated display on the other player(s)' display 20(2) that includes
punctuation, formatting or other indications emphasizing the displayed
text "you," for example: "I'm going to blast you!"
[0027] Such recognition may be in context, on a word-by-word or
sound-by-sound basis, or using any other characteristic such as speech
loudness, speech pitch, speech tone, whether the player is shouting or
whispering, articulation, inflection, language (e.g., English, French,
German, Japanese, etc.), vocabulary, pauses or any other characteristic
of speech. The associated formatting based on the recognition of such
predetermined characteristic can take any form such as size of displayed
text, color of displayed text, language of displayed text, timing of
displayed text, other information displayed along with text, sounds
played while text is being displayed, scrolling or other movement of
displayed text, introduction of visual or audio effects highlighting
displayed text, selection of different displays for displaying displayed
text, selection of portions of display 20 for displaying displayed text,
or any other attribute perceptible by player 12(2).
[0028] FIG. 2 shows an example illustrative non-limiting implementation of
a speech-to-text converter 50 that may be used by example system
10--either in or with game playing equipment 16, within game server 28 or
both. In the example shown, analog speech received from a microphone 26
is converted into digital form by an analog-to-digital converter 52 and
presented to both a phoneme pattern matcher 54 and an amplitude measurer
56. A phoneme pattern matcher 54 attempts to recognize phoneme patterns
within the incoming speech stream. Such phoneme recognition output is
provided to a word pattern matching block 58 that recognizes words in
whatever appropriate language is being spoken by player 12(2). Blocks 54,
58 are conventional and may be supplied by any suitable speech-to-text
conversion algorithm as is well known by those skilled in the art.
[0029] In the example shown, amplitude measurement block 56 provides an
average amplitude output indicating the amplitude or loudness at which
player 12(2) spoke the words into the microphone.
[0030] As shown in FIG. 3, the amplitude and content (word recognition)
outputs provided by the FIG. 2 example speech-to-text converter are
analyzed using an illustrative, non-limiting exemplary analysis route
that detects characteristics in the incoming speech signals. In the
particular illustrative non-limiting example shown, the analyzer 60
determines whether a recognized word is a known stress word such as
"blast", "friend", "enemy", "shoot", or other special word (decision
block 62). If the word is a known stress word ("yes" exit to decision
block 62), then the analyzer 60 may add appropriate formatting
information such as for example "display color=red" (block 64).
Similarly, if the average amplitude of the utterance is above a certain
threshold level A (as tested for by decision block 66), analyzer 60 may
similarly provide appropriate formatting such as color, font, etc. (block
64). In the example shown, if the recognized voice is not a known stress
word and the average amplitude does not exceed a certain threshold level
A ("no" exit to decision block 66), then the analyzer 60 may decide to
display the associated text in a normal color (block 68), but may perform
a further test to determine whether the amplitude is above a threshold B
(which may be lower than threshold A for example) (decision block 70). If
the amplitude level is higher than B ("yes" exit to decision block 70),
then the analyzer may increment the font size to result in a larger font,
an all caps display, or any other perceptible indicia (block 72).
Otherwise, the analyzer 60 may set the font size as "normal" (block 74).
[0031] In one exemplary illustrative non-limiting implementation, the
analyzer 60 may perform additional functionalities such as for example
filtering or replacement of words (e.g., to screen out bad language).
Word substitution is possible using for example a database of word
substitutions. The display instructions 108 shown in FIG. 4 may provide a
conventional scroll-back capability so that game players 12 can scroll
back and review a history of some substantial portion of the text
resulting from previous game play. This provides a record for ready
reference. Different display text may be tagged with the identity of the
player who uttered the associated speech so that different statements can
be attributed to different players.
[0032] FIG. 4 shows an example storage medium 100 that stores instructions
for execution by game playing equipment 16 and/or game server 28. Such
instructions may include for example game play instructions 102, speech
recognition instructions 104 implementing the functionality shown in FIG.
2, analyzer instructions 106 implementing the analyzer functionality
shown in FIG. 3, and display instructions for providing visually
perceptible formatted textual displays on display device 20.
[0033] While the technology herein has been described in connection with
exemplary illustrative non-limiting embodiments, the invention is not to
be limited by the disclosure. The invention is intended to be defined by
the claims and to cover all corresponding and equivalent arrangements
whether or not specifically disclosed herein.
* * * * *