[1] "logical"
[1] "logical"
Lecture 01
Attendance is expected - must attend lab you are enrolled in
Opportunity to work on course assignments with TA support
Labs will begin next week - Monday (1/12)
This course is graded 100% on your coursework (there are no exams).
We will be assessing you based on the following:
| Assignment | Type | Value | n | Assigned |
|---|---|---|---|---|
| Homeworks | Team | 30% | 5/6 | ~ Every other week |
| Midterms | Individual | 40% | 2 | ~ Week 6 and 14 |
| Project | Team | 10% | 1 | ~ Week 10 |
| Quizzes | Individual | 20% | ~10-12 | ~ Wee |
Roughly biweekly homework assignments
Open ended, ~5 - 15 hours of work
Peer evaluation after completion
Expectations and roles:
Only work that is clearly assigned as team work should be completed collaboratively (Homeworks + Project).
Individual assignments (Midterms) must be completed individually, you may not directly share or discuss answers / code with anyone other than the myself and the TAs.
On Homeworks you should not directly share answers / code with other teams, however you are welcome to discuss the problems in general and ask for advice.
We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted.
Unless explicitly stated otherwise, this course’s policy is that you may make use of any online resources (e.g. Google, StackOverflow, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s).
Any recycled/copied code that is not explicitly cited will be treated as plagiarism, regardless of source.
The same applies to the use of LLM like ChatGPT, Claude, Gemini, or GitHub Copilot - you are welcome to make use of these tools as the basis for your solutions but you must cite the tool when using it for significant code generation.
AI tools are not a replacement for understanding the material, but they can be a tool to help you understand the material.
Reaading code and writing code are skills that take time and practice to develop - both are essential.
Nature of the tools is changing rapidly - Autocomplete vs ChatBots vs Agentic
To uphold the Duke Community Standard:
- I will not lie, cheat, or steal in my academic endeavors;
- I will conduct myself honorably in all my endeavors; and
- I will act if the Standard is compromised.
To reduce friction, the preferred method is to use the department’s RStudio server(s).
To access RStudio/Posit Workbench:
If you cannot access RStudio via the DSS servers:
Make sure you are on authenticated Duke network (e.g. DukeBlue or VPN)
Make sure you are not using a custom DNS server
1.1.1.1 or 8.8.8.8As emergency back you can use a Docker container from Duke OIT
Reserve a Container and find a container for any Sta courseIf working locally you should make sure that your environment meets the following requirements:
latest R (4.5.2)
latest RStudio (2026.01.0)
working git installation
ability to create ssh keys (for GitHub authentication)
All R packages updated to their latest version from CRAN
Support policy for local installs - we will try to help you troubleshoot if we can but reserve the right to tell you to use the dept server.
We will be using a GitHub organization for this course github.com/sta323-sp26
All assignments will be distributed and collected via GitHub
All of your work and your membership (enrollment) in the organization is private
We will be distributing a survey this weekend to collection your GitHub account names
All course related repositories will be created for you
Create a GitHub account if you don’t have one
Complete the course survey
Make sure you can login in to the Department’s RStudio server https://rstudio.stat.duke.edu
Setup ssh key authentication with GitHub, see https://github.com/DukeStatSci/github_auth_guide
The fundamental building block of data in R are vectors (collections of related values, objects, etc).
R has two types of vectors:
atomic vectors (vectors)
true/false values, all numbers, or all character strings).generic vectors (lists)
R has six atomic vector types.
typeof() |
mode() |
|---|---|
| logical | logical |
| double | numeric |
| integer | numeric |
| character | character |
| complex | complex |
| raw | raw |
We can check the type of any object in R using the typeof() function. mode() is a higher level abstraction used to group similar types together.
logical - boolean values (TRUE and FALSE)R will let you use T and F as shortcuts to TRUE and FALSE, this is bad practice as these values are actually global variables that can be overwritten.
character - text stringsEither single or double quotes are fine, the opening and closing quote must match.
double - floating point values (these are the default numerical type)
Atomic vectors can be constructed using the combine c() function.
is.logical(x) - returns TRUE if x has type logical.is.character(x) - returns TRUE if x has type character.is.double(x) - returns TRUE if x has type double.is.integer(x) - returns TRUE if x has type integer.is.numeric(x) - returns TRUE if x has mode numeric.is.atomic(x) - returns TRUE if x is an atomic vector.is.list(x) - returns TRUE if x is a list (generic vector).is.vector(x) - returns TRUE if x is either an atomic or generic vector.R is a dynamically typed language – it will automatically convert between most types without raising warnings or errors. Keep in mind that atomic vectors must always contain values of the same type.
Builtin operators and functions (e.g. +, &, log(), etc.) will generally attempt to coerce values to an appropriate type for the given operation (numeric for math, logical for logical, etc.)
Most of the is functions we just saw have an as variant which can be used for explicit coercion.
R uses NA to represent missing values in its data structures.
What may not be obvious is that there are different NAs for the different atomic types.
As NAs represent missing values (most) calculations using them return a missing value.
A useful mental model for NAs is to consider them as a unknown value that could take any of the possible values for a given type.
For numbers or characters this isn’t helpful, but for a logical value it must either be TRUE or FALSE which is relevant for certain calculations.
These are defined as part of the IEEE floating point standard (not unique to R)
NaN - Not a number
Inf - Positive infinity
-Inf - Negative infinity
Inf and NaNthere are predicate functions for testing for these types of values
First remember that Inf, -Inf, and NaN are doubles, however their coercion behavior is not the same as other doubles
What is the type of the following vectors? Explain why they have that type.
Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)?
Sta 323 - Spring 2026